Notes for computer science: August 2012

Monday, 6 August 2012

software engineering bcs III SEM

Prescriptive Process Model

Perspective process model were originally proposed to bring order to the chaos of software development

• It defines a framework activities, s/w engineering actions, tasks, milestones, quality assurance and work products that are required to engineer high-quality software

• Each peocess model also prescribes a work flow, that is the manner in which the process element are interrelated to one another.

• The activities may be linear, incremental, or evolutionary

Waterfall Model

• It is also called classic life cycles, suggests a systemetic, sequential approach to s/w develoment,which begins with customer specification of requirements and progresses through planning,modeling,construction and deployment.

• It is the oldest software life cycle model and best understood by upper management

• It is used when requirements are well understood and risk is low.

• The work flow in waterfall model is in a linear (i.e., sequential) fashion

• It is used often with well-defined adaptations or enhancements to current software

Waterfall Model Problems:-

• It doesn't support iteration, so changes can cause confusion

• It is difficult for customers to state all requirements explicitly and up front

• It requires customer patience because a working version of the program doesn't occur until the final phase

• Problems can be somewhat solved in the model through the addition of feedback loops Waterfall Model with Feedback

•

• A variation in the representation of the waterfall model is called V-model.

The model shows the relationship of quality assurance actions and the actions associated with communication, modelling and construction activities.

As a s/w team move down to the left side of the V, the basic problem requirements are refined.

once code has been generated, the team moves up the right side of the V.

The V- model provide a way of visualizing how verification and validation actions are applied ro engineering work.

Incremental Model

• It is used when requirements are well understood

• In this model multiple independent deliveries are identified

• The work flow is in a linear (i.e., sequential) fashion within an increment and is staggered between increments

• It is iterative in nature; focuses on an operational product with each increment

• It provides a needed set of functionality sooner while delivering optional components later

It is useful also when staffing is too short for a full-scale development

Prototyping Model

• It follows an evolutionary and iterative approach

• It is used when requirements are not well understood

• It serves as a mechanism for identifying software requirements

• It focuses on those aspects of the software that are visible to the customer/user

• In this model Feedback is used to refine the prototype

Prototyping Model

(Potential Problems)

• The customer sees a "working version" of the software, wants to stop all development and then buy the prototype after a "few fixes" are made

• Developers often make implementation compromises to get the software running quickly (e.g., language choice, user interface, operating system choice, inefficient algorithms)

• Lesson learned

• Define the rules up front on the final disposition of the prototype before it is built

• In most circumstances, plan to discard the prototype and engineer the actual production software with a goal toward quality

Spiral Model

It is invented by Dr. Barry Boehm in 1988 while working at TRW

• It follows an evolutionary approach

• Used when requirements are not well understood and risks are high

• Inner spirals focus on identifying software requirements and project risks; may also incorporate prototyping

• Outer spirals take on a classical waterfall approach after requirements have been defined, but permit iterative growth of the software

• Operates as a risk-driven model…a go/no-go decision occurs after each complete spiral in order to react to risk determinations

• Requires considerable expertise in risk assessment

• Serves as a realistic model for large-scale software development

Rapid Application Development Model

Focuses on a SHORT DEVELOPMENT CYCLE.

 Requirements are well-understood and project scope is constrained.

 High-speed adaptation of the waterfall model , in which Rapid

development is achieved by using a component based construction

approach where each major function is completed in 60 to 90 days.

 Maps into generic framework activities.

 Communication : works to clearly understand the business problem.

 Planning: imp. As multiple teams work in parallel on different

system functions.

 Modeling: encompasses Business modeling, data modeling and

process modeling and establishes design representations.

 Construction:

 Deployment:

Incremental Model: As Increments or advancements are delivered

iteratively in every 60 to 90 days

Drawbacks:

 For large, but scalable projects, RAD requires sufficient HR to

create the right number of RAD teams that will be working in

parallel.

 If developers and customers are not committed to Rapid-Fire-

Activities necessary to complete the system, RAD projects fail.

 If the system cannot be properly modularized, building the

Components necessary for RAD will be problematic.

 If high performance is an issue, and performance is to be achieved

through tuning the interfaces to the subsystem components, then

RAD approach may not work.

 RAD may not be appropriate when technical risks are high.

Evolutionary Development Models

 Software like all complex systems , evolves over a period of time.

 Business and product requirement often change as development

proceeds, making a straight line path to an end product unrealistic.

 The set of core product / system requirements is well understood,

but the details of the product/ system extensions have yet to be

defined.

 Evolutionary models are iterative.

 They are characterized in a manner that software engineers to

develop increasingly more complete versions of the software.

 Types of Evolutionary Models

• Prototyping

• Spiral Model

• Concurrent Development model

General Weaknesses of Evolutionary Process Models

• Prototyping poses a problem to project planning because of the uncertain number of iterations required to construct the product

• Evolutionary software processes do not establish the maximum speed of the evolution

• If too fast, the process will fall into chaos

• If too slow, productivity could be affected

• Software processes should focus first on flexibility and extensibility, and second on high quality

• We should prioritize the speed of the development over zero defect Extending the development in order to reach higher quality could result in late delivery

Prototyping:-

• It follows an evolutionary and iterative approach

• It is used when requirements are not well understood

• It serves as a mechanism for identifying software requirements

• It focuses on those aspects of the software that are visible to the customer/user

• In this model Feedback is used to refine the prototype

Prototyping Model (Potential Problems):-

• The customer sees a "working version" of the software, wants to stop all development and then buy the prototype after a "few fixes" are made

• Developers often make implementation compromises to get the software running quickly (e.g., language choice, user interface, operating system choice, inefficient algorithms)

• Lesson learned

• Define the rules up front on the final disposition of the prototype before it is built

• In most circumstances, plan to discard the prototype and engineer the actual production software with a goal toward quality

Spiral Model

It is invented by Dr. Barry Boehm in 1988 while working at TRW

• It follows an evolutionary approach

• Used when requirements are not well understood and risks are high

• Inner spirals focus on identifying software requirements and project risks; may also incorporate prototyping

• Outer spirals take on a classical waterfall approach after requirements have been defined, but permit iterative growth of the software

• Operates as a risk-driven model…a go/no-go decision occurs after each complete spiral in order to react to risk determinations

• Requires considerable expertise in risk assessment

• Serves as a realistic model for large-scale software development

The Spiral Model

One of the most generic of all the models presented by Boehm in 1985.

 Most life cycle models can be derived as special cases of the spiral model.

 The spiral uses a risk management approach to software development.

 The spiral model is therefore iterative

 In each iteration analyze the results of the previous iteration, determine the risk and build a prototype

 During the first circuit, the objectives, alternatives and constraints

are defined.

 The Engineering may follow the life cycle approach or a prototyping

approach. The customer evaluates the work and makes suggestions for modifications..

A Spiral Model, combines the iterative nature of prototyping with the controlled and systematic aspects of the Waterfall Model, therein providing the potential for rapid development of incremental versions of the software. A Spiral Model is divided into a number of framework activities, also called task regions. These task regions could vary from 3-6 in number and they are:

• Customer Communication - tasks required to establish effective communication between the developer and customer.

• Planning - tasks required to define resources, timelines and other project related information /items.

• Risk Analysis - tasks required to assess the technical and management risks.

• Engineering - tasks required to build one or more representation of the application.

• Construction & Release - tasks required to construct, test and support (eg. Documentation and training)

• Customer evaluation - tasks required to obtain periodic customer feedback so that there are no last minute surprises.

Advantages of the Spiral Model

• Realistic approach to the development because the software evolves as the process progresses. In addition, the developer and the client better understand and react to risks at each evolutionary level.

• Focus on early error detection and design flaws

• Gives an early focus to reusable software

• Can be used for hardware-software system development

• The model uses prototyping as a risk reduction mechanism and allows for the development of prototypes at any stage of the evolutionary development.

• It maintains a systematic stepwise approach, like the classic waterfall model, and also incorporates into it an iterative framework that more reflect the real world.

Disadvantages of the Spiral Model

• One should possess considerable risk-assessment expertise

• It has not been employed as much proven models (e.g. the Waterfall Model) and hence may prove difficult to ‘sell’ to the client.

Concurrent development model

Fig. One element of concurrent model

 The concurrent process model defines a series of events that will trigger transition from state to state for each of the SW engineering activities, actions or tasks. All activities exist concurrently but reside in different states

 E. g. The modelling activity which existed in none state while initial communication was completed now makes a transition into under development state. If customer indicates that changes in requirements must be made, modelling activity moves from under development state into awaiting changes state

 Is applicable to all types of software development and provides an accurate picture of the current state of project

 Each activity, action, or task on the network exists simultaneously with other activities, actions, or tasks. Events generated at one point in the process network trigger transitions among the states

General Weaknesses of Evolutionary Process Models

• Prototyping poses a problem to project planning because of the uncertain number of iterations required to construct the product

• Evolutionary software processes do not establish the maximum speed of the evolution

• If too fast, the process will fall into chaos

• If too slow, productivity could be affected

• Software processes should focus first on flexibility and extensibility, and second on high quality

• We should prioritize the speed of the development over zero defects

• Extending the development in order to reach higher quality could result in late delivery

A final word on evolutionary processes:-

 Modern computer software is characterized by continual change, by very tight time lines and by an emphatic need for customer-user satisfaction.

 Evolutionary process models were conceived to address these issues.

We have some concern with evolutionary model

• The first concern is that the prototyping poses a problem to project planning because of the uncertain number of cycles required to construct the product.

• Second, evolutionary software processes do not establish the maximum speed of the evolution.

• Third software processes should focused on flexibility and extensibility rather than on high quality.

The intend of evolutionary models is to develop high quality software in an iterative or incremental manner.

file Organisation:-

FILE ORGANISATION AND ITS TYPES

• Just as arrays, lists, trees and other data structures are used to implement data Organisation in main memory,.

• a number of strategies are used to support the Organisation of data in secondary memory.

• A file organisation is a technique to organise data in the secondary memory.

• File Organisation is a way of arranging the records in a file when the file is stored on the disk.

• Data files are organized so as to facilitate access to records and to ensure their efficient storage.

• . Selection of File Organisations is dependent on two factors as shown below:

1. Typical DBMS applications need a small subset of the DB at any given time.

2. When a portion of the data is needed it must be located on disk, copied to

memory for processing and rewritten to disk if the data was modified.

• A DBMS supports several file Organisation techniques. The important task of the DBA is to choose a good Organisation for each file, based on its type of use.

• The particular organisation most suitable for any application will depend upon such factors as the kind of external storage available, types of queries allowed, number of keys, mode of retrieval and mode of update.

The Figure1 illustrates different file organisations based on an access key.

Figure 1: File Organisation techniques

1 Heap files (unordered file)

• Basically these files are unordered files.

• It is the simplest and most basic type. T

• heap files consist of randomly ordered records.

• The records will have no particular order.

• The operations we can perform on the records are insert, retrieve and delete.

• The features of the heap file or the pile file Organisation are:

1. New records can be inserted in any empty space that can accommodate them.

2. When old records are deleted, the occupied space becomes empty and available for any new insertion.

3. If updated records grow; they may need to be relocated (moved) to a new empty space. This needs to keep a list of empty space.

Advantages of heap files

1. This is a simple file Organisation method.

2. Insertion is somehow efficient.

3. Good for bulk-loading data into a table.

4. Best if file scans are common or insertions are frequent.

Disadvantages of heap files

1. Retrieval requires a linear search and is inefficient.

2. Deletion can result in unused space/need for reorganisation.

2 Sequential File Organisations

• The most basic way to organise the collection of records in a file is to use sequential Organisation.

• Records of the file are stored in sequence by the primary key field values.

• They are accessible only in the order stored, i.e., in the primary key order.

• This kind of file Organisation works well for tasks which need to access nearly every record in a file, e.g., payroll.

• In a sequentially organised file records are written consecutively when the file is created and must be accessed consecutively when the file is later used for input (Figure 2).

Acno	Name	City
101	John	Latur
102	Raman	Ahmedpur
103	Sagar	Latur

Figure 2: Structure of sequential file

• A sequential file maintains the records in the logical sequence of its primary key values.

• Sequential files are inefficient for random access, however, are suitable for sequential access.

• A sequential file can be stored on devices like magnetic tape that allow sequential access.

• On an average, to search a record in a sequential file would require to look into half of the records of the file.

File Organisations in DBMS

• Updating a sequential file usually creates a new file so that the record sequence on primary key is maintained.

• The update operation first copies the records till the record after which update is required into the new file and then the updated record is put followed by the remainder of records.

• Thus method of updating a sequential file automatically creates a backup copy.

• Adding a record requires shifting of all records from the point of insertion to the end of file to create space for the new record.

• On the other hand deletion of a record requires a compression of the file space.

• However, simple queries are time consuming for large files.

Advantages of Sequential File Organisation

• It is fast and efficient when dealing with large volumes of data that need to be processed periodically (batch system).

Disadvantages of sequential File Organisation

• Requires that all new transactions be sorted into the proper sequence for sequential access processing.

• Locating, storing, modifying, deleting, or adding records in the file require rearranging the file.

• This method is too slow to handle applications requiring immediate updating or responses.

3 Indexed (Indexed Sequential) File Organisation

• It organises the file like a large dictionary, i.e., records are stored in order of the key but an index is kept which also permits a type of direct access.

• The records are stored sequentially by primary key values and there is an index built over the primary key field.

• The retrieval of a record from a sequential file, on average, requires access to half the records in the file, making such inquiries not only inefficient but very time consuming for large files.

• To improve the query response time of a sequential file, a type of indexing technique can be added.

• Thus, an index is a mechanism for faster search.

• An index can be small enough to be read into the main memory.

• A sequential (or sorted on primary keys) file that is indexed on its primary key is called an index sequential file.

• The index allows for random access to records, while the sequential storage of the records of the file provides easy access to the sequential records.

Fig. Indexed file organiaation

4.4.4 Hashed File Organisation

• Hashing is the most common form of purely random access to a file or database.

• It is also used to access columns that do not have an index as an optimisation technique.

• Hash functions calculate the address of the page in which the record is to be stored based on one or more fields in the record.

• The records in a hash file appear randomly distributed across the available space.

• It requires some hashing algorithm and the technique.

• Hashing Algorithm converts a primary key value into a record address. The most popular form of hashing is division hashing with chained overflow.

Advantages of Hashed file Organisation

1. Insertion or search on hash-key is fast.

2. Best if equality search is needed on hash-key.

Disadvantages of Hashed file Organisation

1. It is a complex file Organisation method.

2. Search is slow.

3. It suffers from disk space overhead.

4. Unbalanced buckets degrade performance.

5. Range search is slow.

Overview of indexes:-

Basic Concepts

· Indexing mechanisms are used to speed up access to desired data.

· Search Key - attribute to set of attributes used to look up records in a file.

· An index file consists of records (called index entries) of the form

· Index files are typically much smaller than the original file

· Two basic kinds of indices:

o Ordered indices: search keys are stored in sorted order

o Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”.

Index Evaluation Metrics:-

· Access time

· Insertion time

· Deletion time

· Space overhead

· Access types supported efficiently. E.g.,

o records with a specified value in the attribute

o or records with an attribute value falling in a specified range of values.

o This strongly influences the choice of index, and depends on usage.

Ordered Indices:-

· In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library.

· Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.

o Also called clustering index

o The search key of a primary index is usually but not necessarily the primary key.

· Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called
non-clustering index.

· Index-sequential file: ordered sequential file with a primary index.

Dense Index Files:-

· Dense index — Index record appears for every search-key value in the file.

Sparse Index Files:-

· Sparse Index: contains index records for only some search-key values.

· Only applicable when records are sequentially ordered on search-key

· To locate a record with search-key value K we:

· Find index record with largest search-key value < K

· Search file sequentially starting at the record to which the index record points

Multilevel Index:-

· If primary index does not fit in memory, access becomes expensive.

· Solution: treat primary index kept on disk as a sequential file and construct a sparse index on it.

o outer index – a sparse index of primary index

o inner index – the primary index file

· If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.

· Indices at all levels must be updated on insertion or deletion from the file.

Secondary Indices:-

· Frequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index) satisfy some condition.

o Example 1: In the account relation stored sequentially by account number, we may want to find all accounts in a particular branch

o Example 2: as above, but where we want to find all accounts with a specified balance or range of balances

· We can have a secondary index with an index record for each search-key value

· Index record points to a bucket that contains pointers to all the actual records with that particular search-key value.

· Secondary indices have to be dense, since the file is not sorted by the search key.

Tree structured indexing:-

B⁺-Tree Index Files

B⁺-tree indices are an alternative to indexed-sequential files.

· Disadvantage of indexed-sequential files

o performance degrades as file grows, since many overflow blocks get created.

o Periodic reorganization of entire file is required.

· Advantage of B⁺-tree index files:

o automatically reorganizes itself with small, local, changes, in the face of insertions and deletions.

o Reorganization of entire file is not required to maintain performance.

· (Minor) disadvantage of B⁺-trees:

o extra insertion and deletion overhead, space overhead.

· Advantages of B⁺-trees outweigh disadvantages

o B⁺-trees are used extensively

B⁺-Tree Node Structure

· Typical node

o K_i are the search-key values

o P_i are pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes).

· The search-keys in a node are ordered

o K₁< K₂< K₃< . . .< K_n–₁

· Usually the size of a node is that of a block

· Example of a B⁺-tree

B⁺-tree for account file (n = 3)

B⁺-Tree Index File

· All paths from root to leaf are of the same length

· Each node that is not a root or a leaf has between én/2ù and n children.

· A leaf node has between é(n–1)/2ù and n–1 values

· Special cases:

o If the root is not a leaf, it has at least 2 children.

o If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n–1) values.

Leaf Nodes in B⁺-Trees

Properties of a leaf node:

· For i = 1, 2, . . ., n–1, pointer P_i either points to a file record with search-key value K_i, or to a bucket of pointers to file records, each record having search-key value K_i. Only need bucket structure if search-key does not form a primary key.

· If L_i, L_j are leaf nodes and i < j, L_i’s search-key values are less than L_j’s search-key values

· P_n points to next leaf node in search-key order

Non-Leaf Nodes in B⁺-Trees

· Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with m pointers:

o All the search-keys in the subtree to which P₁ points are less than K₁

o For 2 £ i £ n – 1, all the search-keys in the subtree to which P_i points have values greater than or equal to K_i_–1 and less than K_i

o All the search-keys in the subtree to which P_n points have values greater than or equal to K_n_–1

Example of B⁺-tree

B⁺-tree for account file (n = 5)

· Leaf nodes must have between 2 and 4 values
(é(n–1)/2ù and n –1, with n = 5).

· Non-leaf nodes other than root must have between 3 and 5 children (é(n/2ù and n with n =5).

· Root must have at least 2 children.

§ Observations about B⁺-trees

· Since the inter-node connections are done by pointers, “logically” close blocks need not be “physically” close.

· The non-leaf levels of the B⁺-tree form a hierarchy of sparse indices.

· The B⁺-tree contains a relatively small number of levels

§ Level below root has at least 2* én/2ù values

§ Next level has at least 2* én/2ù * én/2ù values

§ .. etc.

o If there are K search-key values in the file, the tree height is no more than é log_é_n_/2_ù(K)ù

o thus searches can be conducted efficiently.

· Insertions and deletions to the main file can be handled efficiently, as the index can be restructured in logarithmic time (as we shall see some details, and more in the book).