Database System Concepts and Architecture

Pre-requisite

To understand the content of this blog, it's important to have a basic understanding of a database management system (DBMS).

You can study that from here.

Contents of this blog

A basic client/server DBMS Architecture
Data Models, Schemas, Instances
Three-Schema Architecture and Data Independence
Database Languages and Interfaces

A basic client/server DBMS Architecture

In this architecture, the system functionality is distributed between two modules:

Client Module: A client module is typically designed to run on a mobile device, user workstation, or personal computer (PC). Typically, application programs and user interfaces that access the database run in the client module. Hence, the client module handles user interaction and provides user-friendly interfaces such as apps for mobile devices or forms—or menu-based GUIs (graphical user interfaces) for PCs.
Server Module: A server module is a specific component of a server that performs a well-defined task or function as part of the server's overall operation. Servers typically handle multiple responsibilities, such as managing requests, delivering responses, or processing data. A "module" breaks these tasks into smaller, reusable, and independent pieces of code that can be loaded, configured, or extended as needed.

Data Models, Schemas, Instances

The main feature of a database approach is that it provides some level of data abstraction. Data Abstraction generally refers to suppressing details of data organization and storage and highlighting the essential features for an improved understanding of data.

A Data Model is a collection of concepts that describe the structure of a database. The database structure includes data types, relationships, and constraints that apply to the data. Most data models also include a set of basic operations for specifying retrievals and updates on the database.

Talking about modern databases, they are not limited to static structures but can also specify behavior. This refers to operations and functions that act on data objects. For example, a user-defined operation like COMPUTE_GPA for a "STUDENT" object calculates a student's GPA dynamically. This dynamic behavior allows a database designer to define valid actions or operations that users can perform on the database.

Categories of Data Models

Many data models have been proposed, which we can categorize according to the types of concepts they use to describe the database structure. High-level or conceptual data models provide concepts that are close to the way many users perceive data, whereas low-level or physical data models provide concepts that describe the details of how data is stored on the computer storage media, typically magnetic disks. Concepts provided by physical data models are generally meant for computer specialists, not for end users. Between these two extremes is a class of representational (or implementation) data models, 4 which provide concepts that may be easily understood by end users but that are not too far removed from the way data is organized in computer storage. Representational data models hide many details of data storage on disk but can be implemented on a computer system directly.

Conceptual data models use concepts such as

Entity: An entity is a real-world object or concept, like a person, place, thing, or event, that is represented in a database. For example, an employee, a car, or a project.
Attribute: An attribute represents some property of interest that further describes an entity, such as the employee’s name or salary.
Relationship: A relationship among two or more entities represents an association among the entities, for example, a works-on relationship between an employee and a project.

Representational or implementation data models are the models used most frequently in traditional commercial DBMSs. These include the widely used relational data model, as well as the so-called legacy data models—the network and hierarchical models—that have been widely used in the past.

Representational data models represent data by using record structures and hence are sometimes called record-based data models.

Physical data models describe how data is stored as files in the computer by representing information such as record formats, record orderings, and access paths. An access path is a search structure that searches particular database records efficiently, such as indexing or hashing. An index is an example of an access path that allows direct access to data using an index term or a keyword.

Another class of data models is known as self-describing data models. The data storage in systems based on these models combines the description of the data with the data values themselves. In traditional DBMSs, the description (schema) is separated from the data. These models include XML as well as many of the key-value stores and NoSQL systems that were recently created for managing big data.

Schemas, Instances, and Database State

In a data model, it is important to distinguish between the description of the database and the database itself. The description of a database is called the database schema, which is specified during database design and is not expected to change frequently. Most data models have certain conventions for displaying schemas as diagrams. A displayed schema is called a schema diagram. An example of schema diagram is shown below.

The diagram displays the structure of the record type but not the actual instances of the records. Each object in the schema—such as STUDENT or COURSE— is called a schema construct.

The actual data in a database may change quite frequently. The data in the database at a particular moment in time is called a database state or snapshot. It is also called the current set of occurrences or instances in the database.

Now let’s talk more about database states,

A database schema is nothing but the blueprint of the schema. It defines how the data is organized, like tables, columns, and relationships. When we define a new database, we specify its database schema only to the DBMS. At this point, the corresponding database state is the empty state with no data. We get the initial state of the database when the database is first populated or loaded with the initial data. From then on, every time an update operation is applied to the database, we get another database state. At any point in time, the database has a current state. The DBMS is partly responsible for ensuring that every state of the database is valid—that is, a state that satisfies the structure and constraints specified in the schema. The DBMS stores the schema details (called meta-data) in a catalog so it knows how to manage the data properly. The schema is called the intention, and a database state is called an extension of the schema.

Three-Schema Architecture and Data Independence

The three schema architecture caters to the 3 out of the four characteristics of the database approach i.e,

use of a catalog to store the database description (schema) to make it self-describing
insulation of programs and data (program-data and program-operation independence
support of multiple user views.

Let’s talk more about this.

The Three-Schema Architecture

The main goal of this architecture is to separate the user applications from the physical database. In this architecture, schema can be described in the following three levels:

The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database.
The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. Usually, a representational data model is used to describe the conceptual schema when a database system is implemented. This implementation conceptual schema is often based on a conceptual schema design in a high-level data model.
The external or view level includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. As in the previous level, each external schema is typically implemented using a representational data model, possibly based on an external schema design in a high-level conceptual data model.

Notice that the three schemas are only descriptions of data; the actual data is stored at the physical level only. In the three-schema architecture, each user group refers to its own external schema. Hence, the DBMS must transform a request specified on an external schema into a request against the conceptual schema, and then into a request on the internal schema for processing over the stored database. If the request is a database retrieval, the data extracted from the stored database must be reformatted to match the user’s external view. The processes of transforming requests and results between levels are called mappings.

Data Independence

The three-schema architecture can be used to further explain the concept of data independence, which can be defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. We can define two types of data independence:

Logical data independence is the ability to make changes to a database's conceptual schema—such as adding or removing tables, columns, or constraints—without impacting the external schemas or application programs that access the database. This ensures that users and applications continue functioning as before, even after structural changes. The DBMS achieves this by updating view definitions and mappings to accommodate the changes so the external schema remains consistent. This feature is critical for maintaining flexibility and adaptability in database management.
Physical data independence is the capacity to change the internal schema without changing the conceptual schema. Hence, the external schemas need not be changed as well. Changes to the internal schema may be required because some physical files were reorganized—for example, by creating additional access structures—to improve the performance of retrieval or update.

Generally, physical data independence exists in most databases and file environments where physical details, such as the exact location of data on disk, and hardware details of storage encoding, placement, compression, splitting, merging of records, and so on are hidden from the user. Applications remain unaware of these details. On the other hand, logical data independence is harder to achieve because it allows structural and constraint changes without affecting application programs—a much stricter requirement.

Database Languages and Interfaces

In this section, we discuss the types of languages and interfaces provided by a DBMS and the user categories targeted by each interface.

DBMS Languages

To manage and interact with database systems, many languages are used,

DDL: The Data Definition Language (DDL) is used to define the structure and schema of the database. It allows database designers and administrators to create, modify, and delete schema constructs such as tables, indexes, and constraints. For example, SQL statements like CREATE TABLE or ALTER TABLE are DDL commands. In many modern DBMSs, DDL is integrated to handle both conceptual and external schemas efficiently.
SDL: The Storage Definition Language (SDL) is concerned with specifying the internal schema, focusing on physical storage structures such as indexing and file organization. While most modern DBMSs do not have a separate SDL, storage details are configured using system parameters or administrative tools. SDL ensures that data is stored in a way that optimizes retrieval and performance.
VDL: The View Definition Language (VDL) is used to define user views, enabling specific ways of accessing and interacting with data. Modern systems typically integrate VDL functionality into SQL using commands like CREATE VIEW. By defining views, users can focus on relevant data without concern for the underlying schema.
DML: The Data Manipulation Language (DML) provides tools for querying and modifying data within the database. It is divided into two types: high-level (nonprocedural) and low-level (procedural). High-level DML, such as SQL commands (SELECT, INSERT, UPDATE, DELETE), enables users to specify what data they want to retrieve or modify without detailing how to do so, making it set-oriented and efficient for handling multiple records at once. In contrast, low-level DML operates on individual records, requiring procedural steps like looping through data for processing, which gives more control but demands greater effort.

Modern DBMSs often use integrated languages like SQL, which combine DDL, VDL, and DML functionalities. This integration simplifies operations, providing a unified approach to schema definition, data manipulation, and user view management.

DBMS Interfaces

User-friendly interfaces provided by a DBMS may include the following:

Menu-based Interfaces for Web Clients or Browsing. Menu-based interfaces guide users through requests by offering step-by-step options from a menu. These eliminate the need to remember complex query syntax. Popular in web-based applications, pull-down menus enable users to navigate or browse database contents in an unstructured way, making them user-friendly and accessible.
Apps for Mobile Devices. Mobile apps provide specialized interfaces for users to access their data on smartphones or tablets. Common examples include banking apps for transactions or reservation apps for bookings. These apps offer secure login features and predefined options like payments, inquiries, and updates tailored for mobile environments.
Forms-based Interfaces. Forms-based interfaces present users with structured forms to input or retrieve data. Users can fill out all or part of the form to perform actions like inserting new data or retrieving matches. These interfaces are typically designed for naive users to interact with predefined, “canned” transactions efficiently.
Graphical User Interfaces. GUIs allow users to interact with databases through visual diagrams representing schemas. Users can execute queries by manipulating graphical elements. Often, GUIs incorporate forms and menus, creating an intuitive and visually engaging way to interact with data.
Natural Language Interfaces. Natural language interfaces enable users to write queries in plain language, like English. These systems interpret requests using dictionaries and schemas, converting them into database queries. If ambiguities arise, the system prompts the user for clarification before processing the query.
Keyword-based Database Search. Similar to search engines, keyword-based interfaces match user-inputted strings with database records or documents. Using predefined indexes and ranking algorithms, these interfaces retrieve relevant results. Though not common in structured databases, keyword-based querying is a growing research area.
Speech Input and Output. Speech-based interfaces use voice commands to query databases or provide results through spoken output. Popular in limited vocabulary applications, like flight inquiries or account information, these systems convert voice input into queries and generate voice responses for ease of use.
Interfaces for Parametric Users. Parametric interfaces are tailored for users performing repetitive tasks, like bank tellers. These systems use simple function keys for common operations, such as deposits or balance checks, minimizing keystrokes and maximizing efficiency for routine transactions.
Interfaces for the DBA. Database administrators (DBAs) use privileged interfaces for tasks like creating accounts, setting system parameters, managing authorizations, and optimizing storage. These interfaces provide advanced tools for managing and maintaining database systems effectively.

This is the end of this blog.

In the next blog, we will learn about Data Modeling Using the entity-relationship (ER) Model. If you want notifications about upcoming blogs, you can subscribe and follow me (Parth Gupta) on Hashnode.

You can connect with me on:

Thank you for Reading.