Friday, October 29, 2010

Deadlocks in SharePoint 2007

One of our custom developed SharePoint application contains code for creating SharePoint sites from a template (stp file) and assigning custom permissions to the created site, lists and document libraries inside that site. Since we have very heavy volume of users using this custom developed SharePoint application, we ran in to deadlock issues on the SharePoint database server, whenever multiple sites are being created at the same time and permissions are being assigned on sites, Lists and document libraries. The following deadlock messages appear on the SharePoint logs whenever multiple users create sites at the same time in the fraction of seconds. Sometimes application fails at the time of site creations and sometimes at the time of assigning permissions.

Execution process goes as follows: Sites are created on the fly through code and permission's are only assigned to the current logged in user, so that current logged in user can carry out his process of uploading documents. A background thread is created to assign permission's to rest of the users for the created site.

Error Message in the front end:

The URL "/sites/1234abcd" is invalid. It may refer to a nonexisting file or folder, or refer to a valid file or folder that is not in the current web.

Notes: In this case site is half created and the site collection under which this half created site is present is not accessible. If we try to login in to site collection it will crash and display one of these messages.

Value does not fall in the expected range.

Template Selection – on the user interface

Group not found.

Deadlock error In the ULS Log:

Unexpected query execution failure, error code 1205. Additional error information from SQL Server is included below. "Transaction (Process ID 110) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction." Query text (if available): "{?=call proc_CreateWeb(?,?,?,?,?,?,?,?,?,?,?,?,?,?)}"


Error Message in the front end:

Operation aborted (Exception from HRESULT: 0x8000404 (E_ABORT))

Notes: in this case site gets creates successfully but permissions are not properly assigned on the lists and document libraries.

Deadlock error In the ULS Log:

Unexpected query execution failure, error code 1205. Additional error information from SQL Server is included below. "Transaction (Process ID 111) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction." Query text (if available): "{?=call proc_SecAddPrincipalToRole(?,?,?,?,?,?)}"


We are working with Microsoft to resolve this issue. I am eagerly waiting to hear the resolution from Microsoft.

Update: (01/11/2011) We were able to reproduce this deadlocks on the Microsoft development environment. Microsoft did not accept that this is a defect in the SharePoint 2007. The contact person, whom we dealt with, told us that they have no right to accept that it is a defect in SharePoint 2007, he only told that he could take this behavior to the notice of SharePoint product group and it is up to the product group to decide if this is the defect with SharePoint 2007. After a while he came back to us, telling that SharePoint product group is not ready to release a hot fix for this issue since SharePoint 2010 is already released. They also told that if they have to release a hot fix it is going to be a major change in the SharePoint 2007 and would not like to take this up at this point since it could affect other parts of SharePoint 2007 which are working well.

They also told that we had this deadlock problem since we are putting lot of stress on SharePoint API. BreakRoleInheritance is a very heavy operation and since we are breaking inheritance, assigning custom permissions a lot on multiple objects on our site creation process we are getting this problem. These deadlocks are being caused on SQL server database and not in C# code. As per my analysis this is a problem in the way stored procedures are written in SQL server database and not something which is caused by putting lot of stress on SharePoint API. So as a final result to resolve this problem, they have proposed a couple of workarounds which did not eliminate deadlocks completely. So we had to change the design of the application to eliminate deadlocks up to some extent.

We and Microsoft had also done the same test on SharePoint 2010 and found that this problem had been eliminated to great extent in SharePoint 2010. Below is the email from Microsoft representative.

I have just finished porting the sample code from MOSS2007 to SPS2010 and finished the testing. I tested with 6 concurrent site creation requests each 0.5 seconds apart. Where it failed in my MOSS2007 environment under those test conditions, it finished successfully on my SPS2010 environment. I took a brief look at the user assignments at the newly created sites and the document libraries and all the user permissions are assigned. I have also found that there were some updates made to the stored procedures from MOSS2007 to SPS2010 targeted to improve performance of the stored procedures. Although I did not see any changes that were made specifically to address the deadlock issues around breakroleinheritance, my test results and findings for updates to the SPs is showing a very positive conclusion that the deadlock issue you are seeing in MOSS2007 has been largely alleviated in SPS2010.

Despite my conclusions above, I would still advise you to perform more extensive stress testing if you should decide to migrate to SPS2010 to resolve this deadlock behavior.

Background:
Microsoft had already released a hot fix for this issue, but it doesn’t really help. Below is the link to it.
http://support.microsoft.com/kb/932056

6 comments:

  1. Have you tried MAXDOP settings on SQL Server?

    ReplyDelete
  2. can you provide more detail around what you changed in your app design to resolve this? Referring to your comment "So we had to change the design of the application to eliminate deadlocks up to some extent"

    ReplyDelete
  3. We had finetuned our application to reduce the number of transactions (Insertion and deletion of permissions) on the SQL server database to minimum. We still experience deadlocks, but to a much smaller extent than earlier.

    ReplyDelete
  4. Working on a similar project, and got really ticked when after 5000 sites had been created, the routine started error-ing on the last steps (break role inheritance)
    Thanks for the post, it's been helpful (at least knowing I'm not the only one

    ReplyDelete
  5. I have the same problem but with sharepoint 2010. When creating multiple SharePoint sub sites by my application at the same time with sevral users, some sites helf created. what do you recommend to solve this issue?
    Liat

    ReplyDelete
  6. Even we had the same problem with SharePoint 2010, but were less failures when compared to SharePoint 2007. As of now we do not have any solution for this problem, but had done multiple workarounds to minimise this issue.

    ReplyDelete