sqlite

Links

CLI Shell info: http://www.sqlite.org/sqlite.html
Better CLI tutorial: http://souptonuts.sourceforge.net/readme_sqlite_tutorial.html
FAQ - http://www.sqlite.org/faq.html
When to use SQLite - https://www.sqlite.org/whentouse.html
SpatiaLite - "SpatiaLite is an open source library intended to extend the SQLite core to support fully fledged Spatial SQL capabilities."
Improved CLI - https://github.com/dbcli/litecli
GUI tool with sqlite support - https://dbeaver.io
"An open source multi-tool for exploring and publishing data" https://docs.datasette.io/en/stable/ / https://simonwillison.net/2018/Aug/19/instantly-publish-datasette/
https://www.sqlite.org/appfileformat.html
https://sqlite.org/src4/doc/trunk/www/design.wiki The Design of SQLite4
https://til.simonwillison.net/sqlite/one-line-csv-operations Simon Willison has a lot of great sqlite knowledge, tools, and examples.
https://sqlitestudio.pl: GUI app to "Create, edit, browse SQLite databases."

Syntax Examples

Open a db read-only

sqlite3 "file:///absolute/path/to/file/datasette.db?mode=ro"

Import a csv file

This method does not appear to support ~/filename or $HOME/filename, but does support relative and absolute paths. The sqlite3 help text says that -csv will "set output mode to 'csv'", but it also affects .import statements.

sqlite3 -csv filename.db ".import path/to/some_file.csv destination_table_name"

There are other modes that you can import, which are shown by .mode help inside a sqlite3 shell. For instance, you can use -tabs to import a TDL file.

Export to a csv file, including headers

This would export the cards database table as a csv:

sqlite3 -csv pokemon-card-collection.db -cmd '.headers on' 'select * from cards' > pokemon-cards.csv

Create a table

CREATE TABLE servers (
  id INTEGER NOT NULL,
  hostname VARCHAR(255),
  ip_addr VARCHAR(32),
PRIMARY KEY (id), UNIQUE (id,hostname));

Or from a unix shell

sqlite3 foo.db "CREATE TABLE servers (
  id INTEGER NOT NULL,
  hostname VARCHAR(255),
  ip_addr VARCHAR(32),
PRIMARY KEY (id),
UNIQUE (id,hostname));"

Add a column to the table

ALTER TABLE servers ADD os varchar(255);

Create a view

"a view is the result set of a stored query, which can be queried in the same manner as a persistent database collection object" - https://en.wikipedia.org/wiki/View_(SQL)

https://www.sqlite.org/lang_createview.html

The following view would show only rows from servers where the ip_addr starts with 192.168.. This is an effective way to move logic into the database, potentially reudcing app complexity:

CREATE VIEW local_servers AS
SELECT hostname,ip_addr FROM servers WHERE ip_addr like '192.168.%' ;

Add rows to the table from unix shell

sqlite3 foo.db "insert into baz values ('50','some text');"

Add rows or update if the row already exists

This syntax is different from other SQL implementations

insert or replace into tablename(filename, hash) values
  ('/etc/hosts', 'somehash'),
  ('/etc/resolv.conf', 'someotherhash');

Output rows via the unix shell

This outputs as columns, but csv, html line, and list exist too.

sqlite3 -column foo.db "SELECT * FROM baz LIMIT 5;"

If -column is truncating your output, instead use -list -separator ' '

-line is the equivalent of mysql's \G or postgres's \x

Select a field where its value occurs more than N times

select DateTimeOriginal
from photos
group by DateTimeOriginal
having count(DateTimeOriginal) > 1 ;

Select field A where the value Field B occurs more than N times

This selects all values for field A (SourceFile), including where duplicates exist for field B (DateTimeOriginal). The prior example would not have shown this if we had added SourceFile to the select.

select SourceFile from photos
where DateTimeOriginal in (
  select DateTimeOriginal from photos
  group by DateTimeOriginal
  having count(DateTimeOriginal) > 1
) order by SourceFile ;

Select a random row from a table

TABLE='costreport'
sqlite3 -line CostReport-1.db "SELECT * FROM $TABLE
WHERE _ROWID_ >= (abs(random()) % (SELECT max(_ROWID_) FROM $TABLE))
LIMIT 1 ;"

Select data from a table and include an incrementing id column

Given the following data:

sqlite> .schema
CREATE TABLE IF NOT EXISTS "people"(
"name" TEXT, "age" TEXT);
sqlite> .mode box
sqlite> select * from people ;
┌───────┬─────┐
│ name  │ age │
├───────┼─────┤
│ brad  │ 18  │
│ david │ 9   │
│ grace │ 29  │
│ john  │ 51  │
│ katie │ 23  │
│ nora  │ 33  │
└───────┴─────┘

You can sort by age and add an id column to show the numeric ordering of their age:

sqlite> select row_number() over (order by cast(age as integer)) as id, * from people ;
┌────┬───────┬─────┐
│ id │ name  │ age │
├────┼───────┼─────┤
│ 1  │ david │ 9   │
│ 2  │ brad  │ 18  │
│ 3  │ katie │ 23  │
│ 4  │ grace │ 29  │
│ 5  │ nora  │ 33  │
│ 6  │ john  │ 51  │
└────┴───────┴─────┘

The row_number() over (order by cast(age as integer)) as id adds the extra column. Because the age column is a string we have to cast it to an integer for the sort to work correctly.

Sum one column ordered by another column

This does a quick cost analysis on an AWS Cost-Report-1.csv file.

## First sanitize the column names
sed -i '1 s#[/: ]#_#g' CostReport-1.csv  # linux sed, macos use gsed
## Next import the csv into a sqlite db
sqlite3 -csv CostReport-1.db ".import CostReport-1.csv costreport"
## Then start a sqlite shell
sqlite3 CostReport-1.db

-- Output Usage Type, ARN or ID, and summed cost as columns
SELECT lineItem_UsageType, lineItem_ResourceId, sum(lineItem_BlendedCost) cost
FROM costreport
GROUP BY lineItem_ResourceId
ORDER BY cost ;

Dump a db from CLI

sqlite3 foo.db ".dump" > foo.sql

Search Skype chat history

sqlite3 "$HOME/Library/Application Support/Skype/daniel.austin/main.db" \
  "SELECT author, timestamp, body_xml FROM messages WHERE body_xml LIKE '%music%' ;"

Expanded functionality skype history search

function skypesearch(){
  skypeusername=$1
  searchstring=$2
  /usr/bin/env sqlite3 "$HOME/Library/Application Support/Skype/${skypeusername}/main.db" \
  "SELECT author, datetime(timestamp,'unixepoch','localtime'), body_xml
  FROM messages
  WHERE body_xml
  LIKE '%${searchstring}%' ;"
}
alias ss="skypesearch john.doe"

Quickly create an image database

A better way to do this would be to use sqlite-utils, which is part of datasette, and there is an example below that shows how to do this.

## Create the database
sqlite3 images.db "create table images (filename varchar(255), createdate timestamp, unique(filename))"
## Populate the database. This can be blindly re-run when new files are added.
exiftool -d "%s" -p 'insert into images values ("$filename", "$DateTimeOriginal");' -q -f -r . | sqlite3 images.db 2> /dev/null
## Query the database
sqlite3 images.db "SELECT filename,datetime(createdate,'unixepoch','localtime') as date FROM images WHERE date LIKE '2014-08-02%';"

Use exiftool and sqlite-utils to find duplicate photos

This example shows how to delete JPG files where a DNG also exists, for photos taken with a camera that keeps track of the ImageNumber, like most DSLRs.

First, ingest all the exif data from your photos into a sqlite db. We specify only the fields we need from exiftool, and output the data as a json blob, then use jq to reformat the json as one record per line, then send that into sqlite-utils. The sqlite-utils --nl option tells it to expect one record per line, --pk SourceFile specifies that field (which is always present in exiftool json output) as the primary key to ensure uniqueness, --replace will update any rows where the primary key already exists (so we can re-run this on an existing database). If we were not specifying exiftool fields, we would need an --alter flag here to tell sqlite-utils to add columns to the table for exif fields that did not exist in any previously imported photos.

SOURCE_DIR=/some/absolute/path/with/photos
find "${SOURCE_DIR}" -type f -iname '*.jpg' -print0 -or -iname '*.dng' -print0 |
xargs -0 exiftool -SerialNumber -ImageNumber -FileType -json |
jq -c '.[]' |
sqlite-utils insert --nl --pk SourceFile --replace ~/photos.db photos -

Now do a sql select for all JPG files that finds the duplicates. The idea here is to find distinct shots from a specific camera body which are identified by concatenating the SerialNumber and the ImageNumber together, and only select rows with FileType = JPEG where that same shot also has a FileType = DNG entry. This avoids deletion of JPG files where there is no DNG for the same shot.

This example does echo rm just for the example. Remove echo to actually do the deletion. This example also shows how to interpolate fields in sqlite3 by using || to concatenate fields and strings to produce a single string that is used to compare with, or output as a result.

sqlite3 ~/photos.db "
select SourceFile from photos
where FileType = 'JPEG'
and SerialNumber || ImageNumber in (
  select SerialNumber || ImageNumber from photos where FileType = 'DNG'
) ;
" |
xargs -r echo rm -fv

Vacuum a database file

"The VACUUM command rebuilds the database file, repacking it into a minimal amount of disk space." - https://www.sqlite.org/lang_vacuum.html

sqlite3 filename.db "VACUUM;"