SQL style guide

From brainsik
Jump to navigation Jump to search

Originally crafted by Matt Chisholm. Has had only minor edits.

Identifiers

All identifiers

The database schema must be immediately comprehensible to any technical person who knows SQL, without requiring access to the application code, data, or any programmer or DBA.

  • No acronyms or abbreviations
    • Except when the acronym or abbreviation is long-ish and in such common use that it is an English word in its own right, such as laser (acronym) or phone (abbreviation). Technical acronyms in common use around the office or in the field do not qualify for this exception. Neither does cm for centimeters, because short acronyms are easily confused with other acronyms.
  • No camelCase or StudlyCaps
    • Even though SQL identifiers are case-insensitive, you still shouldn't. Lowercase is perfectly good.
  • No wordsruntogether
  • Do not use negated or negative terminology:
    • use send_claim=True instead of do_not_send_claim=False
    • use escape_output=True instead of disable_output_escaping=False
  • Do not use the following substrings. They can be redundant and/or confusing:
    • table
    • row
    • record
    • column
    • trigger
    • key
    • database
    • db
    • user (because it is a keyword in SQL)
    • profile
  • Do not use vague terms. They can be redundant and/or confusing. Examples include, but are not limited to: value, length, status, store, &c.
  • Long identifiers are ok and almost always good. You only have to type them in a few places, most likely once in the ORM (Object-Relational Mapper) and a few more times in the controller. Optimize for comprehension, not keystrokes.
  • Always separate different words with _ (underscore)

Column identifiers

  • Do not include the data type in the identifier. sent_bool or address_string are not allowed. Use sent or address instead. If you are tempted to include the data type in the identifier, pick a better identifier.
  • Do not include the table name, or part of the table name, in column identifiers.
  • Unique integer IDs on the table must always be called id, never table_id or foobar_id
  • Non-integer unique IDs can be named however you like (may change in the future).
  • All foreign keys must be prefixed with the name of the table that they reference: telephone_number_id, never telephone_number or foreign_table_id
  • All foreign keys that point to unique integer ids must be suffixed with the _id suffix.

Index identifiers

  • Indexes must be prefixed with name of the table.
  • Indexes should list the column (or columns) that they index across.
  • Indexes must be suffixed with _index

Examples:

  • appointment_id_index
  • address_id_country_index

Trigger identifiers

  • Trigger names should contain a verb denoting the action they are performing and the name(s) of the table(s) they are acting on.
  • Triggers must be suffixed with _before or _after, and with _insert, _update, _write (for triggers that are run at insert and update time), or _delete.

Examples:

  • find_duplicate_address_before_update
  • fix_appointment_date_after_insert
  • remove_patient_dependents_before_delete
  • check_telephone_after_write

Table design

Every table should have created and updated columns which are automatically set, and re-set at insert and update time, respectively, by the ORM. If you bitch about this now, you will be the subject of ridicule and abuse when it saves our ass fixing a hideously bizarre race condition bug.

Consider using a deleted boolean column. It would be set to false by the ORM initially, and ORM queries automatically don't return columns where it is set to true. Actually deleting columns can be problematic when there are triggers and foreign key constraints; using a deleted flag lets you hide items from the user immediately and garbage collect later. Or you can use it to un-do deletion.

Any columns which reference a column in another table (i.e. columns used in joins) should have a foreign key constraint.

Avoid multi-column indexes because they take up lots of space on disk.

Be intelligent

SQL supports far more primitive data types than your average programming language. Use the proper SQL data types to store the right kind of data. Anyone who stores dates as strings, ints as characters, floats as decimals, booleans as ints, and so on will be the subject of abuse and ridicule.

Don't stick stuff in the DB that doesn't need to be there. Intermediate data in the process of being imported or converted, constrained sets such as ISO country or language codes, and application configuration data such as file paths don't belong in the database.