Running and Debugging DBgen models¶
In this section, we walk through the most commonly used command line options that DBgen offers.
dbgen connect¶
The dbgen connect
command connects the user to the postgres database directly. It is the same as using the psql ...
command and passing in the name of the database, the username, the password, and the port. Since dbgen needs all of that information to function anyway, dbgen exposes dbgen connect
simply for convenience.
dbgen connect --test
tests to make sure that the credentials that have been set in the .env file or passed to the command line are valid and allow DBgen to connect to the database. When setting up a new project or pointing an existing model at a new database, it is useful to run this command first to make sure that everything is configured correctly before you run the model.
dbgen run¶
Running Models¶
The dbgen run
command is what is used to run the data pipeline. We'll walk through some of the commonly used options when using the dbgen run
command. These options can be passed as command line arguments, or defaults can be set in the .env file, and global defaults exist for many of the options. DBgen will first use the values passed to the command line if they are present, then fall back on the values set in the .env file, and finally fall back on global defaults if they are not set in either the command line or the .env file.
The --model option¶
To run dbgen models, we must define a function somewhere that returns the dbgen model. Then, when the model is run, the syntax is...
$ dbgen run --model [module_name]:[function_name]
...where the [module_name]
and [function_name]
refer to the location in the code where the function that returns the dbgen model is stored.
The --build flag¶
When this flag is set, the existing database is torn down completely before the model is run. This needs to be done if you have made a change to the schema since the last time you ran the model.
The --include and --exclude options¶
These options are used to only run specific ETLSteps. The syntax is --include [regex]
or --exclude [regex]
. In the include
case, any ETLStep name that matches the regex will be run, and the others will not be run. Conversely, in the exclude
case, any ETLSteps that match the regex will not be run, and all of the others will be run.
Debugging Models¶
DBgen offers the following command line tools to assist with debugging.
The --pdb flag¶
If you are actively developing a model and would like to insert a breakpoint()
(also written pdb.set_trace()
) in your code, you need to add the --pdb
flag when running the model. This is because DBgen by default does a lot of exception handling to ensure that a bug in one ETLStep has as little impact as possible on the rest of the ETLSteps. The --pdb
flag changes the exception handling protocol so that the python debugger (pdb) will work normally.