Site icon Treehouse Blog

Generating Database Seed Files with ChatGPT

Photo by Shubham Dhage on Unsplash

Frameworks like Laravel and Ruby on Rails eliminate countless decisions that you would otherwise be forced to make when building web applications. Even so, a great deal of time is spent on mundane tasks rather than building the intellectual property that makes the application unique. One such task involves generating database seed files. In this post I’ll show you how you can use ChatGPT to generate both static and dynamic database seed files and quickly populate your development database with realistic data.

What is a Database Seed File?

When building a web application it is very useful to popular your user interfaces with realistic-looking data. Some of this data is faked, meaning for instance randomly generated user names and email addresses, whereas other data is templated, meaning it consists of static lists such as US states and Major League Baseball teams.

Once this data has been imported into your development database, it makes building out the various pages and dashboards much more realistic. For example here is an page populated with fake users. This test data is imported into the development database using a seed file. For instance a Laravel-based seed file containing US states might look like this:

<?php

namespace Database\Seeders;

use Illuminate\Database\Console\Seeds\WithoutModelEvents;
use Illuminate\Database\Seeder;

use App\Models\State;

class StateSeeder extends Seeder
{
    /**
     * Run the database seeds.
     */
    public function run(): void
    {

        State::truncate();

        $states = [
            ["name" => "Alabama", "abbreviation" => "AL"],
            ["name" => "Alaska", "abbreviation" => "AK"],
            ["name" => "Arizona", "abbreviation" => "AZ"],
            ["name" => "Arkansas", "abbreviation" => "AR"],
            ["name" => "California", "abbreviation" => "CA"],
            ["name" => "Colorado", "abbreviation" => "CO"],
            ["name" => "Connecticut", "abbreviation" => "CT"],
            ["name" => "Delaware", "abbreviation" => "DE"],
            ["name" => "District of Columbia", "abbreviation" => "DC"],
            ["name" => "Florida", "abbreviation" => "FL"],
            ["name" => "Georgia", "abbreviation" => "GA"],
            ["name" => "Hawaii", "abbreviation" => "HI"],
            ["name" => "Idaho", "abbreviation" => "ID"],
            ["name" => "Illinois", "abbreviation" => "IL"],
            ["name" => "Indiana", "abbreviation" => "IN"],
            ["name" => "Iowa", "abbreviation" => "IA"],
            ["name" => "Kansas", "abbreviation" => "KS"],
            ["name" => "Kentucky", "abbreviation" => "KY"],
            ["name" => "Louisiana", "abbreviation" => "LA"],
            ["name" => "Maine", "abbreviation" => "ME"],
            ["name" => "Maryland", "abbreviation" => "MD"],
            ["name" => "Massachusetts", "abbreviation" => "MA"],
            ["name" => "Michigan", "abbreviation" => "MI"],
            ["name" => "Minnesota", "abbreviation" => "MN"],
            ["name" => "Mississippi", "abbreviation" => "MS"],
            ["name" => "Missouri", "abbreviation" => "MO"],
            ["name" => "Montana", "abbreviation" => "MT"],
            ["name" => "Nebraska", "abbreviation" => "NE"],
            ["name" => "Nevada", "abbreviation" => "NV"],
            ["name" => "New Hampshire", "abbreviation" => "NH"],
            ["name" => "New Jersey", "abbreviation" => "NJ"],
            ["name" => "New Mexico", "abbreviation" => "NM"],
            ["name" => "New York", "abbreviation" => "NY"],
            ["name" => "North Carolina", "abbreviation" => "NC"],
            ["name" => "North Dakota", "abbreviation" => "ND"],
            ["name" => "Ohio", "abbreviation" => "OH"],
            ["name" => "Oklahoma", "abbreviation" => "OK"],
            ["name" => "Oregon", "abbreviation" => "OR"],
            ["name" => "Pennsylvania", "abbreviation" => "PA"],
            ["name" => "Rhode Island", "abbreviation" => "RI"],
            ["name" => "South Carolina", "abbreviation" => "SC"],
            ["name" => "South Dakota", "abbreviation" => "SD"],
            ["name" => "Tennessee", "abbreviation" => "TN"],
            ["name" => "Texas", "abbreviation" => "TX"],
            ["name" => "Utah", "abbreviation" => "UT"],
            ["name" => "Vermont", "abbreviation" => "VT"],
            ["name" => "Virginia", "abbreviation" => "VA"],
            ["name" => "Washington", "abbreviation" => "WA"],
            ["name" => "West Virginia", "abbreviation" => "WV"],
            ["name" => "Wisconsin", "abbreviation" => "WI"],
            ["name" => "Wyoming", "abbreviation" => "WY"]
        ];

        DB::table('states')->insert($states);
    }
}

Even if you don’t know anything about Laravel, the code should be fairly understandable. I created a multidimensional array consisting of each state name and its corresponding two character abbreviation.

Prior to tools like ChatGPT and Copilot, if you wanted to create a seed file containing US States, you would typically go to Google, search for something like “US State names and abbreviations”, and then painstakingly copy and paste the result into an array. Over time, developers understandably became tired of this sort of work and created blog posts and GitHub gists containing commonly used seed files. For instance if you search for “laravel us states seed file” you’ll find quite a few resources containing useful seed files. But even this is a tedious process, because you need to sort through the results, find code that meets your needs, and possibly modify it a bit further to perfectly suit your application. Surely there is a better way?

Land Your Dream Data Analyst Job in 2024!

Learn to code with Treehouse Techdegree’s curated curriculum full of real-world projects and alongside incredible student support. Build your portfolio. Get certified. Land your dream job in tech. Sign up for a free, 7-day trial today!

Start a Free Trial

Enter ChatGPT

Using ChatGPT you can generate these seed files using a simple prompt. Let’s head over to https://chat.openai.com/ and create the same seed file as I presented a moment ago. I’ll use the following prompt:

Please generate a Laravel seed file containing the two digit 
abbreviation and and name of each US state. Just provide the 
code without explanation.

ChatGPT responded with:

<?php

use Illuminate\Database\Seeder;
use Illuminate\Support\Facades\DB;

class StatesTableSeeder extends Seeder
{
    /**
     * Run the database seeds.
     *
     * @return void
     */
    public function run()
    {
        $states = [
            ['abbreviation' => 'AL', 'name' => 'Alabama'],
            ['abbreviation' => 'AK', 'name' => 'Alaska'],
            ['abbreviation' => 'AZ', 'name' => 'Arizona'],
            ['abbreviation' => 'AR', 'name' => 'Arkansas'],
            ['abbreviation' => 'CA', 'name' => 'California'],
            ['abbreviation' => 'CO', 'name' => 'Colorado'],
            ['abbreviation' => 'CT', 'name' => 'Connecticut'],
            ['abbreviation' => 'DE', 'name' => 'Delaware'],
            ['abbreviation' => 'FL', 'name' => 'Florida'],
            ['abbreviation' => 'GA', 'name' => 'Georgia'],
            ['abbreviation' => 'HI', 'name' => 'Hawaii'],
            ['abbreviation' => 'ID', 'name' => 'Idaho'],
            ['abbreviation' => 'IL', 'name' => 'Illinois'],
            ['abbreviation' => 'IN', 'name' => 'Indiana'],
            ['abbreviation' => 'IA', 'name' => 'Iowa'],
            ['abbreviation' => 'KS', 'name' => 'Kansas'],
            ['abbreviation' => 'KY', 'name' => 'Kentucky'],
            ['abbreviation' => 'LA', 'name' => 'Louisiana'],
            ['abbreviation' => 'ME', 'name' => 'Maine'],
            ['abbreviation' => 'MD', 'name' => 'Maryland'],
            ['abbreviation' => 'MA', 'name' => 'Massachusetts'],
            ['abbreviation' => 'MI', 'name' => 'Michigan'],
            ['abbreviation' => 'MN', 'name' => 'Minnesota'],
            ['abbreviation' => 'MS', 'name' => 'Mississippi'],
            ['abbreviation' => 'MO', 'name' => 'Missouri'],
            ['abbreviation' => 'MT', 'name' => 'Montana'],
            ['abbreviation' => 'NE', 'name' => 'Nebraska'],
            ['abbreviation' => 'NV', 'name' => 'Nevada'],
            ['abbreviation' => 'NH', 'name' => 'New Hampshire'],
            ['abbreviation' => 'NJ', 'name' => 'New Jersey'],
            ['abbreviation' => 'NM', 'name' => 'New Mexico'],
            ['abbreviation' => 'NY', 'name' => 'New York'],
            ['abbreviation' => 'NC', 'name' => 'North Carolina'],
            ['abbreviation' => 'ND', 'name' => 'North Dakota'],
            ['abbreviation' => 'OH', 'name' => 'Ohio'],
            ['abbreviation' => 'OK', 'name' => 'Oklahoma'],
            ['abbreviation' => 'OR', 'name' => 'Oregon'],
            ['abbreviation' => 'PA', 'name' => 'Pennsylvania'],
            ['abbreviation' => 'RI', 'name' => 'Rhode Island'],
            ['abbreviation' => 'SC', 'name' => 'South Carolina'],
            ['abbreviation' => 'SD', 'name' => 'South Dakota'],
            ['abbreviation' => 'TN', 'name' => 'Tennessee'],
            ['abbreviation' => 'TX', 'name' => 'Texas'],
            ['abbreviation' => 'UT', 'name' => 'Ut

Note how the output abruptly stops prior to finishing the output. This is because we’ve hit ChatGPT’s currently imposed 3,000 character limit. This restriction will almost certainly change in the future so you might not run into this issue, however if so you can use this prompt to coerce ChatGPT into gracefully continuing:

continue writing the code in last response

ChatGPT will pick right up where it left off, finishing the snippet:

            ['abbreviation' => 'UT', 'name' => 'Utah'],
            ['abbreviation' => 'VT', 'name' => 'Vermont'],
            ['abbreviation' => 'VA', 'name' => 'Virginia'],
            ['abbreviation' => 'WA', 'name' => 'Washington'],
            ['abbreviation' => 'WV', 'name' => 'West Virginia'],
            ['abbreviation' => 'WI', 'name' => 'Wisconsin'],
            ['abbreviation' => 'WY', 'name' => 'Wyoming'],
        ];

        DB::table('states')->insert($states);
    }
}

From here you can copy both snippets into your seed file and proceed as desired.

Refining ChatGPT Code

Suppose you realized you made a mistake instructing ChatGPT to use abbreviation and name attributes for the seed entries, and instead meant to use abbr for the abbreviation. Rather than ask ChatGPT to regenerate the enter seed file you can say:

Please modify the $states array to use abbr as the key instead of abbreviation. Return only the states array and not the entire seed file.

ChatGPT returns:

$states = [
    ['abbr' => 'AL', 'name' => 'Alabama'],
    ['abbr' => 'AK', 'name' => 'Alaska'],
    ['abbr' => 'AZ', 'name' => 'Arizona'],
    ['abbr' => 'AR', 'name' => 'Arkansas'],
    ['abbr' => 'CA', 'name' => 'California'],
    ['abbr' => 'CO', 'name' => 'Colorado'],
    ['abbr' => 'CT', 'name' => 'Connecticut'],
    ['abbr' => 'DE', 'name' => 'Delaware'],
    ['abbr' => 'FL', 'name' => 'Florida'],
    ...

Generating Dynamic Seed Data with ChatGPT

You can also use ChatGPT to generate seed data dynamically. Most programming languages support this ability through a third-party open source dependency. See the Teacher’s notes for a few examples. In the case of PHP that third-party dependency is called Faker. ChatGPT happens to understand Faker and so we can instruct it to use Faker to generate 25 users:

You are a Laravel programmer. Create a seed file that uses the Faker library to seed 25 users into a table named users. Each user should consist of a first_name, last_name, and email field.

ChatGPT will respond with:

<?php

use Illuminate\Database\Seeder;
use Illuminate\Support\Facades\DB;
use Faker\Factory as Faker;

class UsersTableSeeder extends Seeder
{
    /**
     * Run the database seeds.
     *
     * @return void
     */
    public function run()
    {
        $faker = Faker::create();

        for ($i = 0; $i < 25; $i++) {
            DB::table('users')->insert([
                'first_name' => $faker->firstName,
                'last_name' => $faker->lastName,
                'email' => $faker->unique()->safeEmail,
            ]);
        }
    }
}

If after generating the code you’re reminded that your users table doesn’t break the users’ names into separate first and last name fields, you can ask ChatGPT to revise the code:

Please revise your code so each user has a single field for the first and 
last name instead of separate fields first_name and last_name.

ChatGPT will respond with the following:

<?php

use Illuminate\Database\Seeder;
use Illuminate\Support\Facades\DB;
use Faker\Factory as Faker;

class UsersTableSeeder extends Seeder
{
    /**
     * Run the database seeds.
     *
     * @return void
     */
    public function run()
    {
        $faker = Faker::create();

        for ($i = 0; $i < 25; $i++) {
            $fullName = $faker->name;
            $email = $faker->unique()->safeEmail;

            DB::table('users')->insert([
                'name' => $fullName,
                'email' => $email,
            ]);
        }
    }
}

In this response I don’t like how ChatGPT first declared $fullName and $email variables, because the code will be a tad cleaner if we just instantiate the name and safeEmail methods directly inside the insert method. Let’s ask it to revise the code:

Please revise the code to call $faker object methods inside 
the insert method instead of first instantiating variables for the name and email.

Its response:

<?php

use Illuminate\Database\Seeder;
use Illuminate\Support\Facades\DB;
use Faker\Factory as Faker;

class UsersTableSeeder extends Seeder
{
    /**
     * Run the database seeds.
     *
     * @return void
     */
    public function run()
    {
        $faker = Faker::create();

        for ($i = 0; $i < 25; $i++) {
            DB::table('users')->insert([
                'name' => $faker->name,
                'email' => $faker->unique()->safeEmail,
            ]);
        }
    }
}

Great!

Conclusion

Tools like ChatGPT and GitHub Copilot can really supercharge your productivity by helping you to quickly knock out tedious code and data sets that otherwise would have required a fair bit of Googling, copying, pasting, and reformatting. If you found this blog post useful I invite you to sign up for a free seven day Treehouse subscription to learn more about ChatGPT, AI, LLMs, data science, and more!

Exit mobile version