UNIVERSITY OF BRISTOL COMPUTING SERVICE

Designing a database

This appendix gives details of a Patient Care database, which illustrates some aspects of database design.

The data consist of details of each patient (date of birth, sex, etc), doctor, drugs available and details of treatment. The patients are identified by a reference number (PatientID) since family name and given name is not a suitable unique identifier. PatientID has data type Counter so that a new identifier is given automatically for each new patient but the data type of PatientID in the Treatment table must be long integer because the treatment details match an existing patient

By splitting the data like this, duplication of details such as drug name is avoided.

Database design

The following recommendations are made for anyone planning a database application:

Think carefully about the data you plan to store before you create your database. Don't rush into creating a database which has to be restructured and recreated later.

Take professional advice, for example contact the Computing Service to discuss your plans for the database before you start storing it and preferably before you start collecting the data.

Decide which columns are needed in each table. Each column (or field) should only hold one item of data for each row.

Each table should have a field, or combination of fields, which can be used to identify an entry (a row) uniquely. Any record in the PATIENT table can be identified by PatientID, in the DRUGS table by DrugID, the DOCTORS by DoctorID and in the TREATMENT table by a combination of PatientID, DoctorID and DrigID.

When designing the database, think about how many tables you need to ensure that data are not duplicated. In this database a separate table is used to store the treatment details. If the treatment details were stored in the Patient table as:

PATIENT (PatientID, . . . , gender, drug,dose,date)

a new row (including all the personal details) would need to be stored for each treatment. If 10 treatments were given the personal details would have to be entered 10 times.

Duplication of data wastes filespace and results in the data becoming inconsistent unless all copies of the duplicate information are entered and updated at the same time.

Another problem with this proposed structure is that it would be impossible to store the details of a patient unless he or she had at least one treatment. This is known as a storage anomaly. A similar anomaly would arise if there were no separate DRUG table - a drug information could be included only if there was an entry for a patient who took that drug.

Having planned your tables and the columns in each table, check to see that the database structure allows retrieval of all the information you need now or are likely to need in future.

Possible models

The way the details of drugs are stored in the database provides a good example of the sort of problems which can arise.

In this section various different ways of storing the drug information are looked at. Two retrievals are considered:

Is it possible to find out which patients took a specified drug?

Is it possible to find out which drug is taken most often?

Consider the situation where the details of drugs have been stored as one long comment field (a list such as `marshmallows, jelly beans') in the PATIENT table:

PATIENT (PatientID, . . . , telephone, drugs_taken)

It would be possible to find out which PATIENTs had taken marshmallows (by searching for the string `marshmallow' in the drugs_taken field), but it would not be possible to find out which drug had been taken the most often. Nor would it be possible to include more details about each individual drug and when it was taken

It would also be difficult to estimate how long the comment field needs to be, to accommodate all possible drugs broken by an individual.

It would be far easier to find the most common combination of drugs than if they had been stored in a separate table.

Consider the case where each drug has been stored as a separate field (drug1, drug2 etc), for example:

PATIENT (vno, . . . , telephone, drug1, drug2, drug3, . . . )

If the drugs are stored randomly, that is sometimes `marshmallow' is stored as drug1 and sometimes as drug2, it would be impossible to find out which drug had been broken the most often. It is impossible to find which individuals had taken marshmallows without searching every drug field.

It is necessary to know in advance the maximum number of drugs taken by an individual. If too few fields are created a new field would have to be added later.

If the drugs are stored in a specific order the fields should be given the drug name, for example:

PATIENT (vno, . . . , telephone, marshmallow, creme egg, . . . )

Here each field would contain `y' or `n' depending on whether or not the drug was taken. This means that even if only one drug was taken, all fields would have to be entered (or set to null). It also means that to search for an individual with no drugs taken, all the drugs fields would have to be searched.

There is no problem with finding which individuals had a taken marshmallow or which drug was taken most often.

As above, the number and type of drugs taken would have to be predicted in advance.

Creating a separate TREATMENT table as shown below solves all the problems highlighted above:

PATIENT (PatientID, . . . , telephone)
TREATMENT (PatientID, drugid, ..Datestart,dosage..)

Here the drug field contains the id of the taken drug (for example `DG11'). A new row is entered in the TREATMENT table for each drug taken. There is no limit on the number of drugs which can be stored for any PATIENT.

It is possible to calculate which drug is taken most often (using View/Totals).

The individuals who took Marshmallows can be found easily.

Unlike any of the structures above, this structure allows dates and dosages to be stored.