The Scary World of Data Migration… From HTML Pages to Drupal 7, Part 2

So we’ve successfully used QueryPath to extract our data from the flat html files. Now, it’s time to put that data into our Drupal 7 site.

There were some great Drupal 6 modules to handle data imports like Feeds and Node Import. Unfortunately neither of these are fully baked for Drupal 7. Fear not, though, there is a way. As in Drupal 6, you can easily load the Drupal runtime in custom scripts. This gives you access to the full Drupal 7 API, and in the case of data migration, easily allows you to programmatically create nodes. I’ve already set up my content type, and named it “review”. I’ve also set up a folder called “qp” in the Drupal root directory where my custom php file will live. Let’s get started:

Create the php file and include the Drupal API

Create a file in the qp directory called index.php. This will allow us to access this file using http://yoursite/qp. Now for the magic that will allow us to fire up the Drupal bootstrap and use Drupal’s database layer:

[php]
define(‘DRUPAL_ROOT’, ‘../’);
require_once DRUPAL_ROOT . ‘/includes/bootstrap.inc’;
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
[/php]

Get the import data and create a loop

When I extracted the data using QueryPath, I stored it in a custom created database table called import_data. What we’re going to do is query for all the records, then loop through them, adding a new node for each one. Here’s a peak at the first record from my import_data database table:

[php]
+———————————————————————+
+ id | 1 +
+ name | Review #1 +
+ description | <p>This is the first review</p> +
+ origin | <p>The review originated from the New York Times</p> +
+ reliability | <p>This review is very reliable</p> +
+———————————————————————+
[/php]

If you are using a CSV text file, the same theory applies. You would just need to use the CSV functions provided by php to create the loop. Check out PHP.net for an example. Okay, here’s a simple query setup:

[php]
$inserts = db_query(‘SELECT * FROM import_data’);
[/php]

And now to set up the loop:

[php]
foreach($inserts as $row) {
}
[/php]

Make the node object and load up the basics

Inside the loop, we need to create a node object and load it up with our data. First, the empty node object:

[php]
$node = new stdClass();
[/php]

Now, we need to assign the node type, or content type. Make sure to use the machine name for this:

[php]
$node->type = ‘review’;
[/php]

We need to prepare the $node object using the node_object_prepare function that Drupal provides us. This function generates the node add/edit form array so we can later submit it, thus creating our node.

[php]
node_object_prepare($node);
[/php]

Next, we need to define the language that this node will use. Since my site is not language specific, I’m using “und”. For more on the language features for Drupal, visit http://drupal.org/node/324602. I decided to store the language in a variable so I can use it later when we get to the field data

[php]
$lang = ‘und’;
$node->language = $lang;
[/php]

We now need to define a few other basic values, like the uid of the creator of this node, the title, comments enabled or disabled, the created and changed timestamps, and the username of the creator:

[php]
$node->uid = 1;
$node->title = $row->name;
$node->comment = 0;
$node->created = time();
$node->changed = time();
$node->name = ‘admin’;
[/php]

Add our field data

We’ve got the basic node requirements taken care of, so it’s time to add our field data. For my node, I used the standard body field that is native to the content type as well as two other fields, Origin and Reliability. The machine names of these fields are field_origin and field_reliability, respectively. Both fields are textareas and I’d like the format to be Full HTML. Let’s add the description first using the description from our database record $row:

[php]
$node->body = array(
$lang => array(
0 => array(
‘value’ => $row->description,
‘format’ => ‘full_html’,
‘safe_value’ => $row->description,
),
),
);
[/php]

Now for the Origin field:

[php]
$node->field_origin = array(
$lang => array(
0 => array(
‘value’ => $row->origin,
‘format’ => ‘full_html’,
‘safe_value’ => $row->origin,
),
),
);
[/php]

And finally, the Reliability field:

[php]
$node->field_reliability = array(
$lang => array(
0 => array(
‘value’ => $row->reliability,
‘format’ => ‘full_html’,
‘safe_value’ => $row->reliability,
),
),
);
[/php]

Save the node

With our node object created and fully loaded with our data, let’ save it into Drupal. We do a quick if statement just to make sure that the node_submit function returns properly, then follow it up with a node_save and a success message to the screen that includes the node id:

[php]
if($node = node_submit($node)) {
node_save($node);
print ‘Node with nid ‘ . $node->nid . ‘ saved!n<br />’;
}
[/php]

The entire example file

[php]
<?php
// fire up the drupal bootstrap
define(‘DRUPAL_ROOT’, ‘../’);
require_once DRUPAL_ROOT . ‘/includes/bootstrap.inc’;
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

// set up the query object for the import data
$inserts = db_query(‘SELECT * FROM import_data’);

// loop through the
foreach($inserts as $row) {

// start by creating an empty node obect
$node = new stdClass();

// define the content type using the machine name, and prepare the node
$node->type = ‘review’;
node_object_prepare($node);

// now define the language
$lang = ‘und’;

// define the basic node values
$node->uid = 1;
$node->title = $row->name;
$node->comment = 0;
$node->language = $lang;
$node->created = time();
$node->changed = time();
$node->name = ‘admin’;
$node->picture = 0;

// add the description data
$node->body = array(
$lang => array(
0 => array(
‘value’ => $row->description,
‘format’ => ‘full_html’,
‘safe_value’ => $row->description,
),
),
);

// add the origin field data
$node->field_origin = array(
$lang => array(
0 => array(
‘value’ => $row->origin,
‘format’ => ‘full_html’,
‘safe_value’ => $row->origin,
),
),
);

// add the reliability field data
$node->field_reliability = array(
$lang => array(
0 => array(
‘value’ => $row->reliability,
‘format’ => ‘full_html’,
‘safe_value’ => $row->reliability,
),
),
);

// save the node
if($node = node_submit($node)) {
node_save($node);
print ‘Node with nid ‘ . $node->nid . ‘ saved!n<br />’;
}

}

print ‘– COMPLETE’;
[/php]

Success!

Once the page loads completely, you should see success numbers with nids for all of your data. Now you can cruise over to /admin/content to see your newly added nodes.

Tips

I found it helpful to create a review node through the Drupal UI so I could use it as an example of how to define the fields and node values. To see the node object you can use either the Devel module and a dsm command, or put the following code into a custom module’s hook_init, and then view the node.

[php]drupal_set_message(‘<pre>’.print_r($form_state[‘values’],TRUE).'</pre>’);[/php]