Pages

Aug 25, 2010

How to Move Large SharePoint Document Libraries

One recurring need that many firms face is that of moving or copying large Microsoft SharePoint document libraries from one site to another. These sites may be on the same SharePoint farm (e.g., between extranet sites), or from a staging to a production farm. There are several commercially available tools for handling such migrations, from vendors such as AvePoint and Metalogix, but those products can cost thousands of dollars and add significant complexity to your firm's "technology stack." In this article I'll show you how to move a document library of any size with just a bit of .NET code using the Windows SharePoint Services SDK.

THE PROBLEM

You may know that a SharePoint document library (or any list type) can be saved as a template using the native SharePoint web user interface. This feature is designed primarily as a means of saving the "format" of the library, including its field definitions and views. You do, however, have the option to save the library's content along with its definition by simply checking a box.

To save a document library and its contents to a template file, simply select the Settings menu option Document Library Settings from any view of your document library. Note that you will need to have full permissions on the document library to accomplish this task. See Figure 1.

Click image to enlarge
Figure 1. Editing document library settings. Click to enlarge.

Next, click on the "Save document library as template" link. See Figure 2.

Click image to enlarge
Figure 2: Saving the document library as a template. Click to enlarge.

Then fill in the resulting form to describe the document library template, check the "Include Content" box, and click OK. See Figure 3.

Click image to enlarge
Figure 3: Including content. Click to enlarge.

The above operation may succeed for small or empty document libraries, but for those over 10 megabytes, you will receive an error message similar to message in Figure 4, below.

Click image to enlarge
Figure 4: Error message displayed when attempting to export a large library. Click to enlarge.

In fact, the 10-megabyte limit may be increased so that you can export document libraries as large as 500 megabytes using the following STSADM command:

stsadm -o setproperty -propertyname max-template-document-size-propertyvalue 500000000

The value 500000000 represents the maximum number of bytes to allow. If any document library you wish to export as a template is no larger than this value, you're all set!

At my firm, however, we regularly see document libraries in excess of 1 gigabyte, and have had many in the tens-of-gigabyte range. Leaving aside discussions of "best practices" for storage management, this makes transferring document libraries using templates unworkable for us in many instances.

ENTER STSADM EXPORT AND IMPORT

SharePoint ships with the command-line utility STSADM (an example of which we saw in the preceding section). STSADM has a number of functions, including those for backing up and restoring site collections or web sites. STSADM also includes export and import functions that overlap with "backup" and "restore," and in many situations the two can be used interchangeably. The primary difference is that the backup and restore functions are designed with backup or disaster recovery in mind, and provide very limited control over what content is included in the backup. The export and import functions, on the other hand, provide the flexibility to copy an entire site collection or a single web site to a new location on the same or a different farm. They also provide a limited number of options to determine what content to copy, such as:

• which versions of documents to include; and

• whether to copy a site's security as well as its content.

While the STSADM export and import functions may fit the bill if you don't mind copying the entire site in which the source document library resides, it's not designed to copy just the individual document library alone.

EXPORT AND IMPORT WITH THE SHAREPOINT OBJECT MODEL

To accomplish the goal of copying an arbitrarily large document library, and only that document library, from one site to another, we'll need to use a bit of .NET code and the SharePoint object model. The code will perform two broad tasks:

  1. export the source document library to a set of one or more files on disk; and

  2. import the file-set created in step 1 to the target location.

You could write this code in a number of ways, but for simplicity, I'll create two .NET console applications: ExportDocLib and ImportDocLib. Both these applications must be run on the SharePoint web front-end server of the farm containing the source and target document libraries respectively.

ExportDocLib exports the source document library to operating system files. To create the application, open Visual Studio 2005 or later, and create a new C# console application. Next, add a reference to the Windows SharePoint Services .NET assembly, and then add "using" statements at the top of the program to reference the Microsoft.SharePoint and Microsoft.SharePoint.Deployment namespaces.

The rest of the code is fairly straightforward; you will need to provide references to the site collection and the web site within it that contains your document library (I simply referenced the RootWeb property of my site collection because my document library was contained in the top-level web site), a reference to the list to be copied, and information about where to create the export files and export log. I've also instructed SharePoint to retain the security via the IncludeSecurity switch, and to display progress to a command window as the program runs using the CommandLineVerbose switch. There are many other settings you can use, but those shown below are all you'll need for a basic list export:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SharePoint;
using Microsoft.SharePoint.Deployment;
namespace ExportDocLib
{
class Program
{
static void Main(string[] args)
{
// Get handle to web and doclib to export
SPSite site = new SPSite("http://localhost/sites/SITE1");
SPWeb web = site.RootWeb;
SPList list = web.Lists["Shared Documents"];
// Define export settings
SPExportSettings settings = new SPExportSettings();
settings.SiteUrl = "http://localhost/sites/SITE1";
settings.FileLocation = @"C:\Export Files\";
settings.BaseFileName = "SITE1.cmp";
settings.ExportMethod = SPExportMethodType.ExportAll;
settings.LogFilePath =
settings.FileLocation + "SITE1_export_log.txt";
settings.CommandLineVerbose = true;
settings.IncludeSecurity = SPIncludeSecurity.All;
// Add reference to document library to export
SPExportObject exportObject =
new SPExportObject(
list.ID,
SPDeploymentObjectType.List,
web.ID, false);
settings.ExportObjects.Add(exportObject);
// Export it
SPExport export = new SPExport(settings);
export.Run();
}
}
}

ImportDocLib is almost a mirror image of ExportDocLib. ImportDocLib will import the document library from the operating system files created by the ExportDocLib program. As before, start by creating a new C# console application in Visual Studio, adding a reference to the Windows SharePoint Services .NET assembly, and by adding "using" statements to reference the two Microsoft.SharePoint namespaces.

Next you'll create a SPImportSettings object and set its properties to define the location of the import files, the location of the site collection and web site where you want your new copy of the document library, and a location for the import log.

Although it’s beyond the scope of this article, the RetainObjectIdentity setting is noteworthy because its value will determine whether you can apply subsequent incremental imports to this same document library. For example, you could copy the full library once, and then periodically import only changed or new items from the source into the target library. To enable these subsequent imports, however, the RetainObjectIdentity setting must be set to "true." However, you may not set it to "true" if you will be importing a document library into the same content database as the source library, because all objects in a SharePoint database must have unique object IDs. The most likely scenario for which you would use the RetainObjectIdentity switch is to create a copy of a document library on a different SharePoint farm, and subsequently to refresh that second library with updates from the original.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SharePoint;
using Microsoft.SharePoint.Deployment;
namespace ImportDocLib
{
class Program
{
static void Main(string[] args)
{
// Settings for import
SPImportSettings settings = new SPImportSettings();
// File & path
settings.FileLocation = @"C:\Export Files\";
settings.BaseFileName = "SITE1.cmp";
// Site and web to import to
settings.SiteUrl = "http://localhost/sites/SITE2";
settings.WebUrl = "http://localhost/sites/SITE2";
// Set log file location
settings.LogFilePath =
settings.FileLocation + "SITE2_import_log.txt";
// Display messages while running
settings.CommandLineVerbose = true;
// Don't retain object GUIDs, only necessary
// if want to do incremental imports to same list
// at a later time
settings.RetainObjectIdentity = false;
// Keep security, versions, and date/time stamps
settings.UpdateVersions = SPUpdateVersions.Append;
settings.UserInfoDateTime =
SPImportUserInfoDateTimeOption.ImportAll;
// Import it
SPImport import = new SPImport(settings);
import.Run();
}
}
}

You might correctly observe that both the ExportDocLib and ImportDocLib routines can be combined into a single program. This would simplify the process in that you wouldn't need to execute two separate programs to complete the copy process. But when copying a document library from one SharePoint farm to another, you will need to run ExportDocLib on a WFE server in the source farm, and the ImportDocLib on a WFE in the target farm. Keeping them separate gives you the flexibility you need in such instances.

CONCLUSION

With just a bit of .NET coding you can copy document libraries of arbitrary size from one site to another, within or between farms, without the need to purchase a third-party product. With a bit of additional coding you can add the capability of applying incremental updates from a master document library to a copy, thus keeping the secondary library in synch.

1 comment:

nsp said...

How do i change it to folder level which is i want copy?