Monday, August 27, 2012

Computer Performance by Mr. Wiki


Computer performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.
Depending on the context, good computer performance may involve one or more of the following:
  • Short response time for a given piece of work
  • High throughput (rate of processing work)
  • Low utilization of computing resource(s)
  • High availability of the computing system or application
  • Fast (or highly compact) data compression and decompression
  • High bandwidth / short data transmission time

Performance metrics
Computer performance metrics include availability, response time, channel capacity, latency, completion time, service time, bandwidth, throughput, relative efficiency, scalability, performance per watt, compression ratio, instruction path length and speed up. CPU benchmarks are available.

Aspect of software quality

Computer software performance, particularly software application response time, is an aspect of software quality that is important in human–computer interactions.

Technical and non-technical definitions

The performance of any computer system can be evaluated in measurable, technical terms, using one or more of the metrics listed above. This way the performance can be
- compared relative to other systems or the same system before/after changes
- defined in absolute terms, e.g. for fulfilling a contractual obligation
Whilst the above definition relates to a scientific, technical approach, the following definition given by Arnold Allen would be useful for a non-technical audience:
The word performance in computer performance means the same thing that performance means in other contexts, that is, it means "How well is the computer doing the work it is supposed to do?"

Technical performance metrics

There are a wide variety of technical performance metrics that indirectly affect overall computer performance.
Because there are too many programs to test a CPU's speed on all of them, benchmarks were developed. The most famous benchmarks are the SPECint and SPECfp benchmarks developed by Standard Performance Evaluation Corporation and the ConsumerMark benchmark developed by the Embedded Microprocessor Benchmark Consortium EEMBC.
Some important measurements include:
  • Instructions per second – Most consumers pick a computer architecture (normally Intel IA32 architecture) to be able to run a large base of pre-existing, pre-compiled software. Being relatively uninformed on computer benchmarks, some of them pick a particular CPU based on operating frequency .
  • FLOPS – The number of floating-point operations per second is often important in selecting computers for scientific computations.
  • Performance per watt – System designers building parallel computers, such as Google, pick CPUs based on their speed per watt of power, because the cost of powering the CPU outweighs the cost of the CPU itself. 
  • Some system designers building parallel computers pick CPUs based on the speed per dollar.
  • System designers building real-time computing systems want to guarantee worst-case response. That is easier to do when the CPU has low interrupt latency and when it has deterministic response. 
  • Computer programmers who program directly in assembly language want a CPU to support a full-featured instruction set.
  • Low power – For systems with limited power sources (e.g. solar, batteries, human power).
  • Small size or low weight - for portable embedded systems, systems for spacecraft.
  • Environmental impact – Minimizing environmental impact of computers during manufacturing and recycling as well as during use. Reducing waste, reducing hazardous materials. 
  • Giga-updates per second - a measure of how frequently the RAM can be updated
Occasionally a CPU designer can find a way to make a CPU with better overall performance by improving one of these technical performance metrics without sacrificing any other (relevant) technical performance metric—for example, building the CPU out of better, faster transistors. However, sometimes pushing one technical performance metric to an extreme leads to a CPU with worse overall performance, because other important technical performance metrics were sacrificed to get one impressive-looking number—for example, the megahertz myth.

Performance Equation

The total amount of time (t) required to execute a particular benchmark program is
t=N*C/f , or equivalently
P=I*f/N
where
  • P = 1/t is "the performance" in terms of time-to-execute
  • N is the number of instructions actually executed (the instruction path length). The code density of the instruction set strongly affects N. The value of N can either be determinedexactly by using an instruction set simulator (if available) or by estimation—itself based partly on estimated or actual frequency distribution of input variables and by examining generated machine code from an HLL compiler. It cannot be determined from the number of lines of HLL source code. N is not affected by other processes running on the same processor. The significant point here is that hardware normally does not keep track of (or at least make easily available) a value of N for executed programs. The value can therefore only be accurately determined by instruction set simulation, which is rarely practiced.
  • f is the clock frequency in cycles per second.
  • C=1/I is the average cycles per instruction (CPI) for this benchmark.
  • I=1/C is the average instructions per cycle (IPC) for this benchmark.
Even on one machine, a different compiler or the same compiler with different compiler optimization switches can change N and CPI—the benchmark executes faster if the new compiler can improve N or C without making the other worse, but often there is a trade-off between them—is it better, for example, to use a few complicated instructions that take a long time to execute, or to use instructions that execute very quickly, although it takes more of them to execute the benchmark?
A CPU designer is often required to implement a particular instruction set, and so cannot change N. Sometimes a designer focuses on improving performance by making significant improvements in f (with techniques such as deeper pipelines and faster caches), while (hopefully) not sacrificing too much C—leading to a speed-demon CPU design. Sometimes a designer focuses on improving performance by making significant improvements in CPI (with techniques such as out-of-order execution, superscalar CPUs, larger caches, caches with improved hit rates, improved branch prediction, speculative execution, etc.), while (hopefully) not sacrificing too much clock frequency—leading to a brainiac CPU design. For a given instruction set (and therefore fixed N) and semiconductor process, the maximum single-thread performance (1/t) requires a balance between brainiac techniques and speedracer techniques.

|Operating System| Memory management


Memory Management
(SOURCE: http://stargazer.bridgeport.edu/sed/projects/cs503/Spring_2001/kode/os/memory.htm and Wikipedia)

Memory management is the act of managing computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed. This is critical to the computer system.
Several methods have been devised that increase the effectiveness of memory management. Virtual memory systems separate the memory addresses used by a process from actual physical addresses, allowing separation of processes and increasing the effectively available amount of RAM using paging or swapping to secondary storage. The quality of the virtual memory manager can have an extensive effect on overall system performance.

Memory Manager
The purpose of the memory manager is
  • to allocate primary memory space to processes
  • to mao the process address space into the allocated portion of the primary memory
  • to minimize access times using a cost-effective amount of primary memory

Memory Management Algorithms
In an environment that supports dynamic memory allocation, the memory manager must keep a record of the usage of each allocatable block of memory. This record could be kept by using almost any data structure that implements linked lists. An obvious implementation is to define a free list of block descriptors, with each descriport containing a pointer to the next descriptor, a pointer to the block, and the length of the block. The memory manager keeps a free list pointer and inserts entries into the list in some order conducive to its allocation strategy. A number of strategies are used to allocate space to the processes that are competing for memory.

Best Fit - The allocator places a process in the smallest block of unallocated memory in which it will fit.
Problems:
  • It requires an expensive search of the entire free list to find the best hole.
  • More importantly, it leads to the creation of lots of little holes that are not big enough to satisfy any requests. This situation is called fragmentation, and is a problem for all memory-management strategies, although it is particularly bad for best-fit.

Solution:
One way to avoid making little holes is to give the client a bigger block than it asked for. For example, we might round all requests up to the next larger multiple of 64 bytes. That doesn't make the fragmentation go away, it just hides it.
  • Unusable space in the form of holes is called external fragmentation
  • Unusable space in the form of holes is called external fragmentation


Worst Fit - The memory manager places process in the largest block of unallocated memory available. The ides is that this placement will create the largest hole after the allocations, thus increasing the possibility that, compared to best fit, another process can use the hole created as a result of external fragmentation. 

First Fit - Another strategy is first fit, which simply scans the free list until a large enough hole is found. Despite the name, first-fit is generally better than best-fit because it leads to less fragmentation.
Problems:
  • Small holes tend to accumulate near the beginning of the free list, making the memory allocator search farther and farther each time.

Solution:
  • Next Fit


Next Fit - The first fit approach tends to fragment the blocks near the beginning of the list without considering blocks further down the list. Next fit is a variant of the first-fit strategy.The problem of small holes accumulating is solved with next fit algorithm, which starts each search where the last one left off, wrapping around to the beginning when the end of the list is reached (a form of one-way elevator)

Compaction

Compaction attacks the problem of fragmentation by moving all the allocated blocks to one end of memory, thus combining all the holes. Aside from the obvious cost of all that copying, there is an important limitation to compaction: Any pointers to a block need to be updated when the block is moved. Unless it is possible to find all such pointers, compaction is not possible. Pointers can stored in the allocated blocks themselves as well as other places in the client of the memory manager. In some situations, pointers can point not only to the start of blocks but also into their bodies. For example, if a block contains executable code, a branch instruction might be a pointer to another location in the same block. Compaction is performed in three phases. First, the new location of each block is calculated to determine the distance the block will be moved. Then each pointer is updated by adding to it the amount that the block it is pointing (in)to will be moved. Finally, the data is actually moved. There are various clever tricks possible to combine these operations.

Virtual Memory

Virtual memory strategies allow a process to use the CPU when only part of its address space is loaded in the primary. In this process, each process's address space is partitioned into parts that can be loaded into primary memory when they are needed and written back to secondary memory otherwise.
Virtual Memory
  • support programs that are larger than physical memory
  • allow several programs to reside in physical memory at a time (or atleast the currently relevant portions of them) to support multiprogramming.

Virtual Addressing

The essence of virtual memory systems is in developing efficient techniues for dynamic loading- binding virtual addresses to physical addreses at runtime. In virtual memory systems, the size of the virtual address space is greater than the size of the physical address space allocated to the process; that is, the process uses more virtual than physical addreses.
  • Physical Addressing leads to memory fragmentation problems.
  • A solution is Virtual Addressing.

                 >Each process appears to have its own private memory that starts at 
                    address 0. Processes are protected from each other.
                 >Process can be moved in memory by changing its base address

Managing Virtual Memory

Normally, blocks of information are taken from the disk and placed in the memory of the processor. The most common ways of determining the sizes of the blocks to be moved into and out of memory are:
  • Swapping
  • Paging
  • Segmentation

Swapping

If a disk is available, we can also swap blocked jobs out to disk. Swapping out a job involves copying all memory containing data (and program code if it has changed) onto the disk, and releasing the main memory that it was using. The process is then suspended until there is an opportunity to swap it back in again. All data and program code is restored from the copy on disk, allowing the process to continue running in main memory from the point before it was swapped out.

When a job finishes, we first swap back jobs from disk before allowing new jobs to start. When a job is blocked (either because it wants to do I/O or because our short-term scheduling algorithm says to switch to another job), we have a choice of leaving it in memory or swapping it out. One way of looking at this scheme is that it increases the multiprogramming level (the number of jobs ``in memory'') at the cost of making it (much) more expensive to switch jobs.

A variant of the MLFQ (multi-level feedback queues) selection algorithm is particularly attractive for this situation. The queues are numbered from 0 up to some maximum. When a job becomes ready, it enters queue zero. The CPU scheduler always runs a job from the lowest-numbered non-empty queue (i.e., the priority is the negative of the queue number). It runs a job from queue i for a maximum of i quanta. If the job does not block or complete within that time limit, it is added to the next higher queue. Jobs with short quanta in that short bursts get high priority, but does not incur the overhead of frequent swaps between jobs with long bursts. 

Paging

  • Paging allocates memory in fixed-size chunks called pages. A bounds register is not needed because all pages are the same size; if you need less than a page the rest is wasted (internal fragmentation). 
  • Each process has a page table that maps from the virtual page to physical page 
  • Page table can be large. Assume 32-bit address space, 4KB page, 4 bytes per page table entry. How big is the page table for each process? 
  • Say you have a PM(Physical memory) of size 16K.

You can support a large number of programs each of size say 1M.
Programs are provided with a virtual address space (1M).
OS takes care of fetching the location (from PM or from disk).
eg. load R, [20K] - This is a virtual address
programs thus work with virtual addresses. It is the OS' job to find out where it is and get the value. This is done by a mechanism called paging.
Divide the entire virtual address space into units called pages. eg. for the 1 MB, VAS, you have 256 virtual pages (0 ... 255)
Divide the physical memory into pages. eg. for the 16K physical memory, you have 4 pages (0 .. 3)
At any time some (atmost 4) of the virtual pages could be in one of the physical pages. The rest are on the disk in what is called the swap space.
A page is also the unit of transfer between the disk (swap space) and physical memory.
You keep track of which virtual page is in physical memory (if so where), and which is not.
This information is maintained in a page table. PT thus performs the mapping from VA(Virtual Address) to PA (Physical Address).
page table (indexed by virtual address) entry =
Note that this has to be done on every reference (both data and code), which can turn out to be very costly to do by software.
Usually this is done by a hardware unit called the MMU (which sits between the CPU and the cache). 
  • Steps performed by the MMU:

>given a VA, index into the PT(Page Table)
>check if present, if so, put out phys address on the bus and let the memory operation be performed
>what do you do if absent?
         -you have to go to disk and get it.
         -We do not want to do this by hardware.
         -Software should gain control at this point (access control).
         -Hardware raises a page fault.
         -This fault is caught by the OS, the corresponding page is brought in from the disk into a physical page, the PT entry is updated. In the process, a victim may need to be thrown out and its entry invalidated.

Multiple processes can be handled by either switching the address mappings at context switch times, or by simply extending the VAS with a few more bits (process id). 

  • What does a Page Table entry contain:

           >physical pg #,
           >present/absent bit,
           >protection (read/write/execute),
           >modified - on a write this is set. Will be useful when the page has to be replaced to find out if it has to be written to disk.
           >referenced - set each time you read/write to the page. (will be used later)
           >caching disabled - says do not cache it. Useful for I/O devices that are memory mapped.
  • Paging Issues:


           >Page tables are Large:
where is the PT itself stored? Well, if it is very small (a few entries), then you could just keep it in a few registers (or on chip somewhere). But typically they are large. For instance, 32 bit address space, and 4K page size => 1M pages 64 bit address spaces => ? We don't have that much memory. So what do we do? Usually a process does not access all of its address space at once. Exploit this locality factor.
Solution: multi-level page tables (like paging the page tables themselves). eg. 2-level PT, 32 bit VA = [10, 10, 12]. if a program is using some code, a little data and stack, then maybe he needs only the first, second and last entries of the first level PT. 

           >Mapping Must be Fast:

Translation is done on every memory reference (instruction fetch,load/store)
So it has to be fast.
The above mechanisms take atleast 1 additional memory reference (PT lookup) And for multi-level PTs this is even worse.
Make use of the locality in typical executions:
most programs tend to make a large no of references to a small no of pages
there is both temporal and spatial locality with respect to what you are likely to access in the future relative to what you accessed in the past. 
Solution: 
Transalation Lookaside Buffer


Saturday, August 18, 2012

Beauty from NAASCU

National Athletic Association of Schools, Colleges and Universities for past the 11 year has been celebrating and creating a bond between schools, colleges and universities through sports. NAASCU was established in 2001 and Dr. Ernesto Jay G. Adalem is the current president.

NAASCU again, for this year 2012-2013, opened its annual program for colleges and universities to compete and to aim for triumph. On its 12th season, the opening supposed to be on July 30 was moved due to flooded areas along Makati Coliseum and heavy rain fall, its TYPHOON GENER's fault. The opening was moved on August 1. 

Ms. Stacey Gonzales, a Centro Escolar University student, greeted the crowd with a heart-warming song which surely melted the hearts of everyone. After Ms. Gonzales' performance, the crowd immediately cheered for their school. Well, while everyone cheer using their voice and all sort of stuffs they can hit to create noise, CUP just brought DRUMS which literally dominates the whole coliseum. I'm not bitter, am I? Moving on, after the ear-damaging noise brought to us by CUP (most of it of course), other colleges and universities, its been followed by an opening prayer by Bro. Jeofrey Bautista of New Era University. Ms. Gonzales once again enchant the coliseum with her voice singing the national anthem together with ROTC Cadets of City University of Pasay holding the Philippine Flag and NAASCU Flag.

Dr. Ernesto Jay G. Adalem officially opened the 12th celebration of the NAASCU.

After the President's speech, the Philippines very own Mr. Alvin "The Captain" Patrimonio, gave his speech for the athletes and students to treasure. He said that there is no shortcut for success.

Then the crowd roared in unison when muses and escorts walked around the court showing everyone what they've got; beauty, wits and body.

I will cut you short for the information because I'm having this illness they call "Ningas Kugon" (Spelling disregarded).

Well, hunt me if you can for more information HAHA. Anyway, NEU's Muse, Miss Jana Paulina Don Diego is the First Runner up for the 12th NAASCU. Mr. and Ms. NAASCU both comes from STI, Carlo Saldana and Denise Sangalang.



Tuesday, August 7, 2012

Storage Devices

Storage Devices
Storage Device – storage medium and mechanism for storing and retrieving data on that medium.
Storage Medium – physical material that holds data presentation.
Benefits of Storage Devices
A.      Space
Ø  Diskette contains equivalent of 5000 printed pages.
Ø  Optical Disk can hold 500 books.
B.      Reliability
Ø  Relatively safe.
C.      Convenience
Ø  Easy and quick location of data.
D.      Economy
Ø  Saving in storage cost.
Types of Storage Devices
A.      Volatile – data presentation are erased when electric power is turned off.
B.      Non-volatile – data presentation is retained even without power.
Secondary Storages
A.      Magnetic Disk Storage
a.       Diskettes – three and a half inches diskette holds 1.44MB of data.
b.      Hard Disk

Ø  How data is organized:

a.       Track – circular portion of the disk surface that passes under the read/write head. Floppy diskette has 80 tracks on each surface. Hard disk may have 1,000+ tracks on each platter.
b.      Sector – each track is divided into sectors that hold a fixed number of bytes. 512 bytes per sector.
c.       Cluster – a fixed number of adjacent sectors that are treated as a unit of storage. Typically two to eight sectors, depending on the operating system.
d.      Cylinder – the track on each surface that is beneath read/write head at a given position of the read/write head.
Ø  Data Transfer (measure of performance)
a.       Average access time
b.      Data transfer rate
A.      Optical Disk Storage
a.       Compact Disk
§  CD-ROM – drive can only read data from CDs. Stores up to 700MB per disk.
§  CD-R – recordable. Write disk only once.
§  CD-RW – rewritable. Can erase and record over data multiple times.
b.      Digital Versatile Disk
§  Capacity up to 17GB.
§  Allows full length movie.
§  Sounds better than on audio CD.
c.       Magnetic tape storage
§  Similar to tapes used in music cassettes.

Monday, August 6, 2012

File Organization: Basic Concepts and Definition

Hierarchy of data
1.       Bit – smallest unit, consist of 1 & 0.
2.       Byte/Character – eight bits, alpha-numeric and special characters.
3.       Field – combination of two or more characters, column of a database.
4.       Record – combination of two or more related fields, row of a database.
5.       File – combination of two or more related records.
6.       Database – combination of two or more related files, intersection of field and record.
Types of files
A.      According to function
a.       Master file – static view of some aspects of an organization, no deletion of data.
Example: Transcript of record
b.      Transaction file – changes that are applied to the master file.
Example: Receipt/OR
c.       Program file – contains instruction for the processing of data.
Example: Visual basic program file, File.vbp
d.      Work file – a temporary file. Files in progress.
Example: File.temp
e.      Report file – formatted for presentation to a user.
Example: Power point presentation, File.ppt
f.        Text file – contains alphanumeric and graphic data input using a text editor program.
Example: Word file, File.docx
g.       Security file – used to protect and protect changes to a certain file.
Example: Password
h.      Audit file – contains audit and error listing that includes all transactions process.
Example: Users.log
B.      According to mode of access
a.       Input – read by the program.
b.      Output – written to by a program.
c.       Input/Output – both read from and written to during program execution.
File Operation
A.      Batch mode – process of transaction accumulated over a period of time. The performance is measured by throughput.
B.      Interactive mode – transactions are processed as they arrived. The performance is measured by response time.

Fundamental File Operations
A.      Creation – loading of a file.
B.      Update – changing the content of a master file to make it reflect a more current snapshot of the real world.
Changes may include:
Ø  Insertion of new record.
Ø  Modification of existing record.
C.      Retrieval – the access of file.
a.       Inquiry – low volume response.
b.      Report – create many pages of output.
Retrieval request can be:
Ø  Comprehensive – retrieval reports inform action from all the report information from all the record in a file.
Ø  Selective – retrieval applies more qualification or criteria to chose which record will supply information for output.
D.      Maintenance – change of mode to file to the purpose of improving the performance of program that access them.
a.       Restructing – a file implies that structured are mode to the file within the context of the same file organization.
b.      Reorganization – implies change from one file organization to another.

Methods of Organizing Data
A.      Sequential – records are stored in order according to a key field. If a particular record is desired, all prior records must be read first.
Example: Tape storage
B.      Direct – also called as random access. Go directly to desired record by using a key.
Example: MP4 player
C.      Indexed – combines elements of sequential and direct methods. Records are stored sequentially, but file also contains an index, data accessed by a record key.
Example: Dictionary
Record key
A.      Primary – search the record uniquely.
Example: Student number
B.      Secondary – search the record but not uniquely.
Example: Surname
C.      Foreign – combination of primary and secondary.
Example: Full name and school attending