PostgreSQL
"PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance." - https://www.postgresql.org
Usage
Connect to a database
psql "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOSTNAME}:${POSTGRES_PORT:-5432}/${POSTGRES_DB}"
Meta Commands
postgres shell has a lot of meta commands. See https://www.postgresql.org/docs/15/app-psql.html#APP-PSQL-META-COMMANDS for full descriptions.
meta-command | behavior |
---|---|
\l |
List databases. Filter like \l foo* |
\c |
Connect to database. There are various syntaxes to accomplish this. Here is one: \c "host=localhost port=5432 dbname=mydb connect_timeout=10 sslmode=disable" |
\d "$table_name" |
Show full definition for the given table |
\d or \d+ |
Show extended table description. Show all with \d+ * |
\df |
List functions |
\di |
List indexes |
\x |
Toggle expanded display. This is the same as \G in MySQL, separating each record and showing each column as a row formatted as column_name \| row value . |
Show postgres version
astronomer_houston=> SELECT version();
version
--------------------------------------------------------------------------------------------
PostgreSQL 9.6.18 on x86_64-pc-linux-gnu, compiled by Debian clang version 10.0.1 , 64-bit
(1 row)
Show processes
Each process is one connection to the db. (See How Connections are Establisted)
select * from pg_stat_activity ;
or
select count(*) from pg_stat_activity where usename = 'airflow' ; -- note this is usename, not useRname
or a more refined view
select pid as process_id,
usename as username,
datname as database_name,
client_addr as client_address,
application_name,
backend_start,
state,
state_change
from pg_stat_activity
where state != 'idle' ;
Terminate all processes connected to a given database
select pg_terminate_backend(pg_stat_activity.pid)
from pg_stat_activity
where pg_stat_activity.datname = 'some_db_name'
and pid <> pg_backend_pid();
Postgres in Docker
- Official images: https://hub.docker.com/_/postgres
Some of these syntaxes apply to non-docker interactions too, so long as you remove the docker-isms from them.
Example docker-compose file for local development
## https://docs.docker.com/compose/compose-file/
version: "3.7"
services:
postgres:
# https://hub.docker.com/_/postgres
image: "postgres:latest"
restart: "always"
env_file: .env # Should contain POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD
# Uncomment the 'command' line to enable postgres query logging to the terminal
# https://www.postgresql.org/docs/15/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN
# command: [ "postgres", "-c", "log_destination=stderr", "-c", "log_min_messages=debug", "-c", "log_min_error_statement=debug" ]
expose:
- "5432"
ports:
- "5432:5432"
Dump a database
https://www.postgresql.org/docs/15/app-pgdump.html
docker exec "${POSTGRES_CONTAINER}" pg_dump -U "${POSTGRES_USER}" "${POSTGRES_DB}"
Full backups should be performed with pg_dumpall.
Load local data into a db
https://www.postgresql.org/docs/15/app-psql.html
cat foo.sql |
docker exec -i "${POSTGRES_CONTAINER}" \
psql -U "${POSTGRES_USER}" "${POSTGRES_DB}"
Example queries
Show sizes for all DBs in a given server
pokemon_collector=> SELECT
pg_database.datname,
pg_database_size(pg_database.datname) AS size,
pg_size_pretty(pg_database_size(pg_database.datname))
FROM pg_database
ORDER BY size ;
datname | size | pg_size_pretty
----------------+---------------+----------------
postgres | 7436824 | 7263 kB
cloudsqladmin | 7641624 | 7463 kB
pokemon | 8231448 | 8039 kB
smeargle_4080 | 10230296 | 9991 kB
ampharos_7008 | 10877464 | 10 MB
litten_7803 | 12655128 | 12 MB
silvally_9992 | 64589765144 | 60 GB
cinderace_8705 | 86162946584 | 80 GB
emolga_1932 | 92073171480 | 86 GB
aegislash_4257 | 1265669151256 | 1179 GB
(10 rows)
Show all of the table sizes for a given schema
sunny_moon_4257_airflow=> SELECT table_name, pg_total_relation_size(table_name), pg_size_pretty(pg_total_relation_size(table_name))
FROM sunny_moon_4257_airflow.information_schema.tables
where table_schema = 'airflow'
ORDER BY pg_total_relation_size(table_name) desc ;
table_name | pg_total_relation_size | pg_size_pretty
-------------------------------+------------------------+----------------
xcom | 1269437857792 | 1182 GB
job | 77586432 | 74 MB
dag_run | 60440576 | 58 MB
log | 58630144 | 56 MB
task_instance | 31784960 | 30 MB
serialized_dag | 851968 | 832 kB
rendered_task_instance_fields | 843776 | 824 kB
task_fail | 638976 | 624 kB
import_error | 393216 | 384 kB
dag | 122880 | 120 kB
dag_code | 122880 | 120 kB
ab_user | 98304 | 96 kB
ab_permission_view_role | 90112 | 88 kB
astro_available_version | 90112 | 88 kB
slot_pool | 81920 | 80 kB
ab_user_role | 73728 | 72 kB
ab_view_menu | 73728 | 72 kB
ab_permission | 73728 | 72 kB
ab_role | 73728 | 72 kB
ab_permission_view | 73728 | 72 kB
astro_version_check | 65536 | 64 kB
sensor_instance | 57344 | 56 kB
alembic_version | 57344 | 56 kB
connection | 24576 | 24 kB
task_reschedule | 24576 | 24 kB
ab_register_user | 24576 | 24 kB
sla_miss | 24576 | 24 kB
variable | 24576 | 24 kB
dag_pickle | 16384 | 16 kB
known_event | 16384 | 16 kB
dag_tag | 8192 | 8192 bytes
known_event_type | 8192 | 8192 bytes
(32 rows)
Show the rows with the largest values
sunny_moon_4257_airflow=> select timestamp, pg_column_size(value) as size from xcom order by size desc limit 20 ;
timestamp | size
-------------------------------+-----------
2021-09-25 14:23:40.0142+00 | 188149150
2021-09-24 14:24:39.699152+00 | 171979158
2021-09-23 14:24:08.201579+00 | 158880026
2021-09-22 14:24:03.309817+00 | 144807562
2021-09-21 14:24:25.052796+00 | 129923562
(5 rows)
See Also
- pgcli: "Pgcli is a command line interface for Postgres with auto-completion and syntax highlighting." https://github.com/dbcli/pgcli
- http://www.pgadmin.org - Graphical UI for postgres
- MySQL: Another relational database
- SQLite: File based local database that does not require a server.
- https://hakibenita.com/postgresql-unknown-features: "Lesser Known PostgreSQL Features. Features you already have but may not know about!"
- https://github.com/citusdata/citus: Horizontal scaling extension
- https://cloud.google.com/alloydb: GCP managed postgres that has advanced clustered scaling features
- https://www.postgresql.org/docs/current/postgres-fdw.html: module that provides sharding across multiple postgres servers
- https://www.cybertec-postgresql.com/en/btree-vs-brin-2-options-for-indexing-in-postgresql-data-warehouses
- https://www.amazingcto.com/postgres-for-everything
- https://postgrest.org/en/stable: REST API directly from postgres
- https://challahscript.com/what_i_wish_someone_told_me_about_postgres