General-Purpose Computer

Embedded Systems Landscape

Peter Barry , Patrick Crowley , in Modern Embedded Computing, 2012

System Resources and Features

General-purpose and embedded computer systems differ most in the variability of system resources and features rather than in their quantity. Embedded computer systems are typically designed and deployed with a relatively static and predetermined set up of organisation resources and features.

This fact simplifies systems software and certain organisation processes, such equally booting the arrangement or diagnosing issues. For example, the kick process for an IA-32-based full general-purpose computer, and the design of the software that implements that process, must be organized to debate with an unpredictable ready of retentivity and I/O resources when the arrangement starts. This resource uncertainty is non present in most embedded computer systems; hence, embedded system boot processes are shorter and simpler.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123914903000011

Larger computers

G.R. Wilson , in Embedded Systems and Computer Architecture, 2002

14.3 Storage within a computer

A general-purpose calculator, such as pop personal computers, contains various storage devices, such every bit main memory, magnetic disks, and optical disks. Optical disks and floppy magnetic disks are used principally to store programs and information on a device that is external to the computer. These storage media are user-friendly for the retail distribution of programs and for the archiving of data in a fashion that is secure against a failure of the computer. Magnetic hard disks are used to store programs and data in a form that is set up to be accessed by the figurer without the user having to insert an optical or floppy deejay. The main retentiveness shop in a estimator is made from a number of RAM devices two , and is used to shop code and data for programs that the computer is currently using. Finally, the microprocessor itself contains registers that store the data that is currently being processed.

We tin regard these storage devices as being in a hierarchy, ordered according to how close they are to the microprocessor, Figure 14.1. In full general, a high access speed implies pocket-size size and high price per byte.

Effigy 14.1. Retentivity bureaucracy

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780750650649500158

From the Footing Up!

Luis F. Chaparro , Aydin Akan , in Signals and Systems Using MATLAB (3rd Edition), 2019

0.three Implementation of Digital Bespeak Processing Algorithms

Continuous-time signals are typically processed using analog systems composed of electrical circuit components such as resistors, capacitors, and inductors together with semiconductor electronic components such as diodes, transistors, and operational amplifiers, amid others. Digital signals, on the other hand, are sequences of numbers and processing them requires numerical manipulation of these sequences. Uncomplicated addition, multiplication, and delay operations are enough to implement many discrete-fourth dimension systems. Thus, digital signal processing systems are easier to pattern, develop, simulate, test, and implement than analog systems by using flexible, reconfigurable, and reliable software and hardware tools. Digital indicate processing systems are employed these days in many applications such as prison cell phones, household appliances, cars, ships and airplanes, smart home applications, and many other consumer electronic devices. The fast development of digital applied science has enabled high-capacity processing hardware tools at reasonable costs available for real-time applications. Refer to [44,54] for in-depth details.

A digital signal processing system may be used to perform a job on an analog signal x ( t ) , or on an inherently discrete-time betoken 10 [ n ] . In the onetime case, the analog signal is first converted into digital grade by using an analog-to-digital converter which performs sampling of the analog signal, quantization of the samples, and encoding the amplitude values using a binary representation. A digital point processing system may be represented by a mathematical equation defining the output point as a function of the input past using arithmetics operations. Designing these systems requires the development of an algorithm that implements arithmetic operations.

A full general-purpose computer may be used to develop and test these algorithms. Algorithm evolution, debugging and testing steps are generally washed by using a high-level programming tool such equally MATLAB or C/C++. Upon successful development of the algorithm, and subsequently running simulations on examination signals, the algorithm is ready to be implemented on hardware. Digital signal processing applications ofttimes require heavy arithmetic operations, east.g., repeated multiplications and additions, and every bit such defended hardware is required. Possible implementations for a real-time implementation of the developed algorithms are:

General-purpose microprocessors (μPs) and micro-controllers (μCs).

Full general-purpose digital signal processors (DSPs).

Field-programmable gate arrays (FPGAs).

Selecting the best implementation hardware depends on the requirements of the application such as performance, price, size, and power consumption.

0.3.one Microprocessors and Micro-Controllers

With increasing clock frequencies (for processing fast changing signals) and lower costs, full general-purpose microprocessors and micro-controllers have become capable of handling many digital point processing applications. However, complex operations such as multiplication and sectionalisation are time consuming for general-purpose microprocessors since they need a series of operations. These processors do not have the all-time architecture or on fleck facilities required for efficient digital signal processing operations. Moreover, they are usually non cost effective or power efficient for many applications.

Micro-controllers are application-specific micro-computers that contain built-in hardware components such as central processing unit (CPU), memory, and input/output (I/O) ports. As such, they are referred to as embedded controllers. A diversity of consumer and industrial electronic products such as home appliances, automotive control applications, medical devices, space and military applications, wireless sensor networks, smart phones, and games are designed using micro-controllers. They are preferred in many applications due to their small size, low cost, and providing processor, memory, and random-access memory (RAM) components, all together in ane chip.

A very pop micro-controller platform is the Arduino electronic board with an on-board micro-controller necessary and input/output ports. Arduino is an open up-source and flexible platform that offers a very simple way to blueprint a digital signal processing application. The congenital-in micro-controller is produced in an compages having a powerful arithmetics logic unit that enables very fast execution of the operations. User-friendly software evolution environment is available for gratuitous, and information technology makes information technology very easy to design digital signal processing systems on Arduino boards.

0.3.2 Digital Signal Processors

A digital signal processor is a fast special-purpose microprocessor with architecture and instruction set designed specifically for efficient implementation of digital signal processing algorithms. Digital signal processors are used for a wide range of applications, from communications and command to speech and epitome processing. Applications embedded digital signal processors are ofttimes used in consumer products such every bit mobile phones, fax/modems, disk drives, radio, printers, medical and health care devices, MP3 players, high-definition television (HDTV), and digital cameras. These processors take become a very popular choice for a broad range of consumer applications, since they are very toll constructive. Software development for digital signal processors has been facilitated by especially designed software tools. DSPs may exist reprogrammed in the field to upgrade the product or to fix whatever software bugs, with useful congenital-in software development tools including a project build surround, a source code editor, a C/C++ compiler, a debugger, a profiler, a simulator, and a real-time operating system. Digital signal processors provide the advantages of microprocessors, while being like shooting fish in a barrel to utilize, flexible, and lower cost.

0.3.3 Field Programmable Gate Arrays

Another way to implement a digital signal processing algorithm is using field-programmable gate arrays (FPGAs) which are field-programmable logic elements, or programmable devices that contain fields of pocket-size logic blocks (normally NAND gates) and elements. The logic block size in the field-programmable logic elements is referred to equally the "granularity" which is related to the try required to complete the wiring betwixt the blocks. There are iii chief granularity classes:

Fine granularity or Pilkington (body of water of gates) architecture

Medium granularity

Large granularity (Complex Programmable Logic Devices)

Wiring or linking between the gates is realized by using a programming tool. The field-programmable logic elements are produced in various memory technologies that permit the device to be reprogrammable, requiring brusque programming time and protection against unauthorized utilise. For many high-bandwidth bespeak processing applications such as wireless, multimedia, and satellite communications, FPGA technology provides a better solution than digital betoken processors.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780128142042000090

Depression-Level Efficiency Issues

Peter Norvig , in Paradigms of Artificial Intelligence Programming, 1992

10.1 Use Declarations

On full general-purpose computers running Lisp, much fourth dimension is spent on type-checking. Yous tin proceeds efficiency at the cost of robustness by declaring, or promising, that certain variables volition ever be of a given blazon. For example, consider the post-obit function to compute the sum of the squares of a sequence of numbers:

(defun sum-squares (seq)

  (let ((sum 0))

  (dotimes (i (length seq))

  (incf sum (foursquare (elt seq i))))

  sum))

(defun foursquare (x) (* x x))

If this function will but exist used to sum vectors of fixnums, we tin make it a lot faster by adding declarations:

(defun sum-squares (vect)

  (declare (type (simple-array fixnum *) vect)

  (inline foursquare) (optimize speed (safety 0)))

  (let ((sum 0))

  (declare (fixnum sum))

  (dotimes (i (length vect))

  (declare (fixnum i))

  (incf sum (the fixnum (square (svref vect i)))))))

  sum))

The fixnum declarations let the compiler employ integer arithmetic straight, rather than checking the type of each addend. The (the fixnum … ) special form is a promise that the statement is a fixnum. The (optimize speed (safety 0)) annunciation tells the compiler to brand the role run as fast as possible, at the possible expense of making the code less prophylactic (by ignoring type checks and so on). Other quantities that tin can be optimized are compilation-speed, space and in ANSI Mutual Lisp but, debug (ease of debugging). Quantities tin exist given a number from 0 to three indicating how important they are; 3 is nearly important and is the default if the number is left out.

The (inline foursquare) announcement allows the compiler to generate the multiplication specified by foursquare right in the loop, without explicitly making a function call to square. The compiler will create a local variable for (svref vect i) and will not execute the reference twice—inline functions practise non have any of the problems associated with macros as discussed on page 853. Even so, at that place is one drawback: when y'all redefine an inline part, you may demand to recompile all the functions that telephone call it.

You should declare a role inline when it is short and the function-calling overhead will thus exist a significant function of the total execution time. Y'all should not declare a part inline when the role is recursive, when its definition is probable to change, or when the function's definition is long and information technology is called from many places.

In the case at hand, declaring the office inline saves the overhead of a office telephone call. In some cases, further optimizations are possible. Consider the predicate starts-with:

(defun starts-with (list 10)

  "Is this a list whose first element is 10?"

  (and (consp list) (eql (kickoff list) x)))

Suppose we have a code fragment like the following:

(if (consp listing) (starts-with listing x) …)

If starts-with is declared inline this volition aggrandize to:

(if (consp list) (and (consp listing) (eql (outset list) 10)) …)

which many compilers will simplify to:

(if (consp list) (eql (first list) 10) …)

Very few compilers do this kind of simplification across functions without the hint provided by inline.

Likewise eliminating run-time type checks, declarations also allow the compiler to choose the most efficient representation of data objects. Many compilers support both boxed and unboxed representations of data objects. A boxed representation includes enough information to determine the blazon of the object. An unboxed representation is just the "raw bits" that the figurer can bargain with directly. Consider the following role, which is used to clear a 1024   ×   1024 assortment of floating betoken numbers, setting each one to cipher:

(defun clear-k-assortment (array)

  (declare (optimize (speed 3) (safety 0)))

  (declare (type (unproblematic-array single-float (1024 1024)) assortment))

  (dotimes (i 1024)

  (dotimes (j 1024)

  (setf (aref array i j) 0.0))))

In Allegro Common Lisp on a Sun SPARCstation, this compiles into quite good code, comparable to that produced past the C compiler for an equivalent C program. If the declarations are omitted, withal, the performance is most forty times worse.

The problem is that without the declarations, it is non safe to store the raw floating betoken representation of 0.0 in each location of the array. Instead, the plan has to box the 0.0, allocating storage for a typed pointer to the raw $.25. This is done inside the nested loops, and then the result is that each call to the version of clear-g-array without declarations calls the floating-indicate-boxing part 1048567 times, allocating a megaword of storage. Needless to say, this is to exist avoided.

Not all compilers heed all declarations; you should check before wasting time with declarations your compiler may ignore. The function disassemble can be used to show what a part compiles into. For example, consider the picayune function to add 2 numbers together. Hither information technology is with and without declarations:

(defun f (x y)

  (declare (fixnum ten y) (optimize (safe 0) (speed 3)))

  (the fixnum (+ x y)))

(defun g (10 y) (+ x y))

Here is the disassembled code for f from Allegro Common Lisp for a Motorola 68000-series processor:

> (disassemble 'f)
;; disassembling #<Role f ® #x83ef79   >
;; formals: x y
;; lawmaking vector © #x83ef44
0: link a6.#0
4: move.l a2,-(a7)
half dozen: move.l a5,-(a7)
8: move.l 7(a2),a5
12: movement.l 8(a6).d4 ; y
16: add.fifty 12(a6),d4 ; x
twenty: movement.l #l,dl
22: move.l -8(a6),a5
26: unlk a6
28: rtd #eight

This may look intimidating at kickoff glance, but you don't have to exist an expert at 68000 assembler to proceeds some appreciation of what is going on hither. The instructions labeled 0–eight (labels are in the leftmost column) comprise the typical office preamble for the 68000. They exercise subroutine linkage and shop the new office object and constant vector into registers. Since f uses no constants, instructions 6, 8, and 22 are really unnecessary and could exist omitted. Instructions 0,4, and 26 could besides be omitted if you don't care about seeing this function in a stack trace during debugging. More recent versions of the compiler will omit these instructions.

The heart of office f is the two-instruction sequence 12–16. Instruction 12 retrieves y, and 16 adds y to x, leaving the upshot in d4, which is the "outcome" annals. Instruction 20 sets dl, the "number of values returned" register, to 1.

Contrast this to the lawmaking for k, which has no declarations and is compiled at default speed and safety settings:

> (disassemble 'g)
;; disassembling #<Function k @ #x83dbd1   >
;; formals: x y
;; code vector @ #x83db64
0: add.l #eight,31(a2)
4: sub.westward #2,dl
6: beq.s 12
8: jmp sixteen(a4) ; wnaerr
12: link a6,#0
16: move.l a2,-(a7)
18: motion.l a5,-(a7)
20: move.l vii(a2),a5
24: tst.b   208(a4) ; indicate-hit
28 beq.s 34
xxx: jsr 872(a4) ; process-sig
34: move.l viii(a6),d4 ; y
38: move.fifty 12(a6),d0 ; 10
42: or.50 d4,d0
44: and.b #7,d0
48: bne.s 62
50: add.fifty 12(a6),d4 ; 10
54: bvc.south 76
56: jsr 696(a4) ; add together-overflow
60: bra.s 76
62: move.l 12(a6),-(a7) ; 10
66: motility.l d4,-(a7)
68: movement.l #ii,dl
70: movement.l -304(a4),a0 ; +   _2op
74: jsr (a4)
76: move.l #1,d1
78: move.l -8(a6),a5
82: unlk a6
84: rtd #8

See how much more piece of work is done. The commencement 4 instructions ensure that the right number of arguments have been passed to g. If non, there is a jump to wnaerr (wrong-number-of-arguments-error). Instructions 12–20 have the argument loading code that was at 0–eight in f. At 24–xxx there is a check for asynchronous signals, such equally the user hitting the abort cardinal. After x and y are loaded, there is a type check (42–48). If the arguments are non both fixnums, and so the lawmaking at instructions 62–74 sets up a telephone call to +   _2op, which handles type coercion and not-fixnum improver. If all goes well, we don't have to call this routine, and do the addition at instruction l instead. But even then we are not done—just considering the two arguments were fixnums does not mean the issue will be. Instructions 54–56 check and branch to an overflow routine if needed. Finally, instructions 76–84 render the last value, just as in f.

Some low-quality compilers ignore declarations altogether. Other compilers don't need certain declarations, considering they can rely on special instructions in the underlying architecture. On a Lisp Automobile, both f and m compile into the same code:

vi PUSH ARG|0 ; X
7 + ARG|one ; Y
viii RETURN PDL-POP

The Lisp Machine has a microcoded   +   didactics that simultaneously does a fixnum add and checks for non-fixnum arguments, branching to a subroutine if either argument is not a fixnum. The hardware does the work that the compiler has to do on a conventional processor. This makes the Lisp Auto compiler simpler, so compiling a role is faster. However, on modem pipelined computers with instruction caches, there is little or no reward to microcoding. The current trend is away from microcode toward reduced instruction set up computers (RISC).

On most computers, the post-obit declarations are most likely to be helpful:

fixnum and float. Numbers declared every bit fixnums or floating-signal numbers tin can be handled directly by the host computer's arithmetics instructions. On some systems, float past itself is not enough; y'all have to say unmarried-float or double-float. Other numeric declarations volition probably exist ignored. For instance, declaring a variable equally integer does non help the compiler much, considering bignums are integers. The code to add bignums is likewise complex to put inline, so the compiler will branch to a general-purpose routine (like +   _2op in Allegro), the same routine it would employ if no declarations were given.

list and array. Many Lisp systems provide split up functions for the listing- and array- versions of commonly used sequence functions. For example, (delete   ×   (the list 1 )) compiles into (sys: delete-list-eql × 1) on a TI Explorer Lisp Machine. Some other function, sys:delete-vector, is used for arrays, and the generic function delete is used only when the compiler tin't tell what type the sequence is. So if you lot know that the argument to a generic function is either a 1ist or an array, then declare it equally such.

elementary-vector and unproblematic-array. Simple vectors and arrays are those that practice not share structure with other arrays, do non have fill pointers, and are not adaptable. In many implementations it is faster to aref a simple-vector than a vector. It is certainly much faster than taking an elt of a sequence of unknown type. Declare your arrays to be simple (if they in fact are).

(array blazon). Information technology is often important to specialize the type of array elements. For case, an (array short-f1oat) may take just half the storage of a general array, and such a announcement will ordinarily allow computations to exist washed using the CPU'southward native floating-point instructions, rather than converting into and out of Common Lisp's representation of floating points. This is very important because the conversion normally requires allocating storage, but the straight computation does non. The specifiers (simple-assortment type) and (vector type) should be used instead of (array type) when advisable. A very common fault is to declare (simple-vector type). This is an mistake because Common Lisp expects (simple-vector size)—don't ask me why.

(array *dimensions). The full form of an array or simple-assortment blazon specifier is (array type dimensions). So, for example, (array bit (* *)) is a ii-dimensional bit array, and (array bit (1024 1024)) is a 1024   ×   1024 bit array. It is very important to specify the number of dimensions when known, and less important to specify the verbal size, although with multidimensional arrays, declaring the size is more important. The format for a vector type specifier is (vector type size).

Annotation that several of these declarations tin apply all at once. For example, in

(position # \ . (the elementary-string file-proper name))

the variable filename has been alleged to exist a vector, a simple assortment, and a sequence of type string-char. All three of these declarations are helpful. The type simple-cord is an abridgement for (elementary-array string-char).

This guide applies to most Common Lisp systems, simply you should look in the implementation notes for your item arrangement for more advice on how to fine-melody your code.

Read total chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780080571157500108

Microcomputer Buses and Links

J.D. Nicoud , in Encyclopedia of Concrete Scientific discipline and Technology (Tertiary Edition), 2003

I.A Introduction

In any general-purpose figurer, workstation, or dedicated controller based on a microprocessor, data transfers are continuously being performed betwixt the processor, the memory, and the input/output (I/O) devices. Frequent transfers imply a high bandwidth, economically feasible only for short distances. For distances greater than a few meters, the cost of the electrical or optical lines forces the serialization of information.

A typical computer arrangement consists of the processor (master) and several memory and I/O devices (slaves) interconnected by a set of data and control lines named buses (Fig. 1). These devices are by and large clearly recognizable when they are connected by a backplane bus (Fig. 12). They are oft mixed on a single board computer. The autobus allows bidirectional transfers betwixt a perhaps variable set of devices. The links toward the peripherals have a simpler construction since they are point to indicate. Connecting several devices on a bus, or transferring data over long distances, implies solving many electrical problems correctly and taking care of the propagation time inside devices and over the transmission lines.

Figure ane. Typical estimator system.

Figure 12. Typical board size for standard buses.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122274105004397

Symmetric Multiprocessor Architecture

Thomas Sterling , ... Maciej Brodowicz , in High Performance Calculating, 2018

6.2 Compages Overview

An SMP is a full-standing cocky-sufficient calculator organisation with all subsystems and components needed to serve the requirements and support actions necessary to behave the computation of an application. It tin can be employed independently for user applications cast as shared-memory multiple-threaded programs or every bit one of many equivalent subsystems integrated to class a scalable distributed-memory massively parallel processor (MPP) or article cluster. Information technology can also operate every bit a throughput computer supporting multiprogramming of concurrent independent jobs or as a platform for multiprocess bulletin passing jobs, even though the interprocess data exchange is achieved through shared memory transparent to the parallel programming interface. The following sections draw the key subsystems in some detail to convey how they contribute to achieving performance, principally through parallelism and various functionality with distinct technologies. This section begins with a brief overview of the total system of an SMP architecture and the basic purposes of its major components, to provide a context for the afterwards detailed discussions.

Like whatever general-purpose estimator, an SMP serves a key set of functions on behalf of the user application, either directly in hardware or indirectly through the supporting operating organisation. These are typically:

teaching event and functioning functions through the processor core

programme education storage and application data storage upon which the processor cores operate

mass and persistent storage to hold all information required over long periods of time

internal data movement communication paths and control to transfer intermediate values between subsystems and components within the SMP

input/output (I/O) interfaces to external devices exterior the SMP, including other mass storage, computing systems, interconnection networks, and user interfaces, and

control logic and subsystems to manage SMP operation and coordination among processing, retentiveness, internal data paths, and external communication channels.

The SMP processor cores perform the chief execution functions for the application programs. While these devices contain substantial complexity of design (described later), their primary functioning is to identify the adjacent teaching in retention to execute, read that instruction into a special instruction register, and decode the binary instruction coding to determine the purpose of the performance and the sequence of hardware signals to be generated to control the execution. The didactics is issued to the pipelined execution unit, and with its related data it proceeds through a sequence of microoperations to determine a concluding result. Ordinarily the initial and resulting data is acquired from and deposited to special storage elements called registers: very high-speed (high bandwidth, low latency) latches that hold temporary values. Somewhat simplistically, in that location are five classes of operations that make up the overall functionality of the processor cadre.

1.

The basic annals-to-register integer, logic, and character operations.

2.

Floating-signal operations on real values.

3.

Conditional branch operations to control the sequence of operations performed dependent on intermediate data values (usually Boolean).

iv.

Memory access operations to move data to and from registers and the main retentivity system.

5.

Actions that initiate control of data through external I/O channels, including transfer to mass storage.

Until 2005 substantially all processors in the historic period of very large-scale integration (VLSI) technology were single-microprocessor integrated circuits. But with the progress of semiconductor technology reflecting Moore's law and the limitations of education-level parallelism (ILP) and clock rates due to ability constraints, multicore processors (or sockets) starting with dual-core sockets have dominated the processor market over the concluding decade. Today processors may contain a few cores, 6–16, with new classes of lightweight architectures permitting sockets of greater than 60 cores on a chip. An SMP may contain one or more such sockets to provide its processing capability (Fig. 6.one). Acme functioning of an SMP is approximated by the product of the number of sockets, the number of cores per socket, the number of operations per didactics, and the clock charge per unit that normally determines the educational activity consequence rate. This is summarized in Eq. (6.i).

Figure 6.1. Internal to the SMP are the intranode data paths, standard interfaces, and motherboard control elements.

(6.i) P p due east a k N southward o c k e t due south N c o r eastward south p e r s o c k east t R c fifty o c k N o p e r a t i o n s p e r i due north southward t r u c t i o n

The SMP retentivity consists of multiple layers of semiconductor storage with complex control logic to manage the access of data from the retentivity by the processor cores, transparent vertical migration through the cache hierarchy, and enshroud consistency across the many cache stacks supporting the processor core and processor stack caches. The SMP retentiveness in terms of the location of data that is being operated on is, in fact, three separate kinds of hardware. Already mentioned are the processor core registers; very fast latches that have their own namespace and provide the fastest access time (less than one wheel) and lowest latency. Each core has its own sets of registers that are unique to it and separated from all others. The chief retention of the SMP is a large set of memory modules divided into memory banks that are accessible by all the processors and their cores. Principal retentiveness is implemented on separate dynamic random admission retention (DRAM) chips and plugged into the SMP motherboard's industry-standard retentivity interfaces (concrete, logical, and electrical). Information in the main memory is accessed through a virtual address that the processor translates to a physical address location in the main memory. Typically an SMP will accept from one–4   gigabytes of primary retentiveness capacity per processor core.

Between the processor core annals sets and the SMP chief memory banks are the caches. Caches span the gap of speeds between the charge per unit at which the processor core accesses information and the rate at which the DRAM can provide information technology. The difference betwixt these two is easily two orders of magnitude, with a cadre fetch charge per unit in the guild of two accesses per nanosecond and the memory cycle time in the guild of 100   ns. To achieve this, the cache layers exploit temporal and spatial locality. In simple terms, this ways that the enshroud organization relies on information reuse. Ideally, data access requests will exist satisfied with data present in the level 1 (L1) cache that operates at a throughput equivalent to the demand rate of a processor core and a latency of 1 to four cycles. This assumes that the sought-afterwards data has already been acquired before (temporal locality) or that it is very near data already accessed (spatial locality). Under these conditions, a processor core could operate very most its peak operation adequacy. But due to size and power requirements, L1 caches (both data and instruction) are relatively pocket-sized and susceptible to overflow; at that place is a need for more information than can be held in the L1 cache alone. To address this, a level 2 (L2) cache is nearly always incorporated, again on the processor socket for each core or sometimes shared amidst cores. The L2 cache holds both data and instructions and is much larger than the L1 caches, although much slower. L1 and L2 caches are implemented with static random admission retention (SRAM) circuit design. As the separation between cadre clock rates and main memory cycle times grew, a third level of cache, L3, was included, although these were usually implemented equally a DRAM chip integrated within the same multi-scrap module packaging of the processor socket. The L3 cache volition often be shared among two or more cores on the processor package.

This contributes to achieving the second critical belongings of the SMP retentiveness bureaucracy: cache coherency. The symmetric multiprocessing attribute requires copies of main memory data values that are held in caches for fast admission to be consequent. When 2 or more copies of a value with a virtual address are in distinct physical caches, a change to the value of ane of those copies must exist reflected in the values of all others. Sometimes the bodily value may be changed to the updated value, although more frequently the other copies are merely invalidated so an obsolete value is not read and used. There are many hardware protocols that ensure the correctness of data copies, started every bit early on as the 1980s with the modified sectional shared invalid [1] family unit of protocols. The necessity to maintain such data coherence across caches within an SMP adds design complication, time to access information, and increased energy.

Many SMP systems contain their own secondary storage to hold large quantities of information, both plan codes and user information, and practice and so in a persistent manner so as to not lose stored information afterwards the associated applications cease, other users apply the organization, or the system is powered down. Mass storage has commonly been achieved through hard magnetic disk technology with one or more spinning disk drives. More recently, although with somewhat lower density, solid-land drives (SSDs) have served this purpose. While more expensive, SSDs exhibit superior admission and bicycle times and better reliability as they have no moving parts. Mass storage presents two logic interfaces to the user. Explicitly, it supports the file system consisting of a graph structure of directories, each holding other directories and terminate-user files of data and programs. A consummate set of specific file and directory admission service calls is made available to users as part of the operating system to use the secondary storage. A second abstraction presented past mass storage is as part of the virtual memory system, where "pages" of block data with virtual addresses may be kept on disk and swapped in and out of primary memory as needed. When a page asking is made for data that is non plant in retentivity, a page mistake is indicated and the operating arrangement performs the necessary tasks to make room for the requested page in main retentivity by moving a less-used folio on to deejay then bringing the desired page into memory while updating various tables. This is performed transparently to the user, but tin take more than a million times longer than a like data access request to cache. Some SMP nodes, especially those used every bit subsystems of commodity clusters or MPPs, may non include their ain secondary storage. Referred to as "diskless nodes", these volition instead share secondary storage which is itself a subsystem of the supercomputer or even external file systems shared by multiple computers and workstations. Diskless nodes are smaller, cheaper, lower energy, and more reliable.

Every SMP has multiple I/O channels that communicate with external devices (outside the SMP), user interfaces, data storage, system expanse networks, local area networks, and wide expanse networks, among others. Every user is familiar with many of these, as they are also found on deskside and laptop systems. For local area and system area networks, interfaces are most often provided to Ethernet and InfiniBand (IB) to connect to other SMPs of a larger cluster or institutional environments such as shared mass storage, printers, and the cyberspace. The universal series bus (USB) has get so widely employed for diverse purposes, including portable flash drives, that information technology is ubiquitous and available on essentially everything larger than a screen pad or laptop, and certainly on any deskside or rack-mounted SMP. JTAG is widely employed for system assistants and maintenance. The Serial Advanced Technology Zipper (SATA) is widely used for external disk drives. Video graphics array and high-definition multimedia interface provide directly connexion to high-resolution video screens. There is usually a connexion specifically provided for a directly connected user keyboard. Depending on the organization, in that location may be a number of other I/O interfaces.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012420158300006X

Sensor Network Platforms and Tools

Feng Zhao , Leonidas J. Guibas , in Wireless Sensor Networks, 2004

7.1 Sensor Node Hardware

Sensor node hardware tin can be grouped into iii categories, each of which entails a different ready of trade-offs in the design choices.

Augmented general-purpose computers: Examples include low-power PCs, embedded PCs (e.yard., PC104), custom-designed PCs (e.m., Sensoria WINS NG nodes), 1 and various personal digital administration (PDA). These nodes typically run off-the-shelf operating systems such as Win CE, Linux, or existent-time operating systems and use standard wireless communication protocols such every bit Bluetooth or IEEE 802.xi. Because of their relatively higher processing adequacy, they can accommodate a wide variety of sensors, ranging from unproblematic microphones to more sophisticated video cameras.

Compared with dedicated sensor nodes, PC-like platforms are more ability hungry. However, when power is non an issue, these platforms take the reward that they can leverage the availability of fully supported networking protocols, pop programming languages, middleware, and other off-the-shelf software.

Dedicated embedded sensor nodes: Examples include the Berkeley mote family unit [98], the UCLA Medusa family [202], Ember nodes, 2 and MIT µAMP [32]. These platforms typically apply commercial off-the-shelf (COTS) chip sets with accent on minor form cistron, depression power processing and advice, and simple sensor interfaces. Considering of their COTS CPU, these platforms typically back up at least one programming language, such as C. Notwithstanding, in order to proceed the plan footprint modest to adapt their small retentiveness size, programmers of these platforms are given full access to hardware but barely any operating system support. A classical example is the TinyOS platform and its companion programming language, nesC. We will discuss these platforms in Section vii.iii.1 and 7.3.2

System-on-flake (SoC) nodes: Examples of SoC hardware include smart dust [109], the BWRC picoradio node [187], and the PASTA node. 3 Designers of these platforms try to push the hardware limits by fundamentally rethinking the hardware compages merchandise-offs for a sensor node at the chip blueprint level. The goal is to find new ways of integrating CMOS, MEMS, and RF technologies

to build extremely low power and small footprint sensor nodes that even so provide certain sensing, computation, and communication capabilities. Since most of these platforms are currently in the research pipeline with no predefined teaching set, in that location is no software platform support available.

Amidst these hardware platforms, the Berkeley motes, due to their small form factor, open source software development, and commercial availability, take gained broad popularity in the sensor network research community. In the following department, we requite an overview of the Berkeley MICA mote.

vii.1.1 Berkeley Motes

The Berkeley motes are a family of embedded sensor nodes sharing roughly the aforementioned compages.Effigy vii.one shows a comparison of a subset of mote types.

Figure 7.i. A comparison of Berkeley motes.

Let u.s.a. take the MICA mote as an example. The MICA motes take a two-CPU pattern, equally shown in Figure 7.2. The main microcontroller (MCU), an Atmel ATmega103L, takes care of regular processing. A separate and much less capable coprocessor is simply active when the MCU is being reprogrammed. The ATmega103L MCU has integrated 512 KB flash memory and 4 KB of information retention. Given these small memory sizes, writing software for motes is challenging. Ideally, programmers should exist relieved from optimizing lawmaking at assembly level to keep lawmaking footprint small. Notwithstanding, loftier-level support and software services are not free. Being able to mix and match simply necessary software components to back up a item application is essential to achieving a pocket-size footprint. A detailed discussion of the software compages for motes is given in Section 7.3.1.

Effigy seven.2. MICA mote architecture.

In addition to the retention inside the MCU, a MICA mote likewise has a dissever 512 KB wink retentivity unit that tin can hold data. Since the connection between the MCU and this external memory is via a low-speed serial peripheral interface (SPI) protocol, the external memory is more suited for storing information for afterward batch processing than for storing programs. The RF communication on MICA motes uses the TR1000 chip prepare (from RF Monolithics, Inc.) operating at 916 MHz ring. With hardware accelerators, it can achieve a maximum of l kbps raw data rate. MICA motes implement a 40 kbps transmission rate. The transmission power can be digitally adapted by software though a potentiometer (Saying DS1804). The maximum manual range is about 300 feet in open up space.

Like other types of motes in the family unit, MICA motes support a 51 pin I/O extension connector. Sensors, actuators, series I/O boards, or parallel I/O boards can be connected via the connector. A sensor/actuator board can host a temperature sensor, a low-cal sensor, an accelerometer, a magnetometer, a microphone, and a beeper. The serial I/O (UART) connectedness allows the mote to communicate with a PC in real fourth dimension. The parallel connection is primarily for downloading programs to the mote.

Information technology is interesting to await at the energy consumption of various components on a MICA mote. As shown in Figure 7.3, a radio transmission bears the maximum power consumption. However, each radio packet (e.g., thirty bytes) only takes 4 ms to send, while listening to incoming packets turns the radio receiver on all the time. The energy that can ship ane packet only supports the radio receiver for about 27 ms. Some other observation is that at that place are huge differences amongst the power consumption levels in the active way, the idle mode, and the suspend mode of the MCU. It is thus worthwhile from an free energy-saving indicate of view to suspend the MCU and the RF receiver as long as possible.

Figure 7.three. Power consumption of MICA motes.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781558609143500079

The iPod

Mike Kuniavsky , in Smart Things, 2010

9.2.ii.1 iTunes

iTunes is a full general purpose computer avatar. It is distinguished from other Store avatars by its breadth of functionality and its function as a gateway to the Shop for devices that could non directly connect. Other store avatars specialize in a certain subset of functionality, but iTunes (Figure 9-7) contains most all of the direction and playback functionality of the other products. This functional heterogeneity also distinguishes information technology from Apple tree's other software products, which typically focus on creating and editing simply a single media format. In contrast, the iTunes feature list includes functions as varied as CD called-for, Internet radio listening, podcast subscription, ringtone creation, and digital video downloading.

Figure ix-7. The author'due south iTunes 8, showing basic music playing mode.

In improver to delivering all of the elements of the service described above, it also controls the other avatars. It is used to load content onto iPods, to synchronize downloaded video content with Apple tree Television receiver, and to ship content to AirTunes, an Apple tree technology for streaming music between devices over a local network.

With the iPhone, Apple placed the iTunes Store on the actual device. By making iTunes unnecessary to buy music, it moved the control point straight to the hardware avatar. Until this change, the iTunes service was organized equally a hub-and-spoke model, in which iTunes was the hub, and each specialized avatar spoke.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123748997000096

Concurrency in the Cloud

Dan C. Marinescu , in Cloud Calculating (Second Edition), 2018

3.xiv Multithreading and Concurrency in Java; FlumeJava

Java is a general-purpose computer programming language designed with portability in mind at Sun Microsystems. 4 Coffee applications are typically compiled to bytecode and can run on a Java Virtual Car (JVM) regardless of the calculator architecture. Java is a class-based, object-oriented linguistic communication with back up for concurrency. It is 1 of the well-nigh popular programming language and it is widely used for a wide range of applications running on mobile devices and estimator clouds.

Java Threads. Coffee supports processes and threads. Call back that a process has a self-contained execution environment, has its own individual address infinite and run-time resource. A thread is a lightweight entity within a procedure. A Java application starts with ane thread, the principal thread which can create additional threads.

Memory consistency errors occur when different threads have inconsistent views of the aforementioned information. Synchronized methods and synchronized statements are the two idioms for synchronization. Serialization of critical sections is protected by specifying the synchronized attribute in the definition of a class or method. This guarantees that simply one thread tin execute the critical department and each thread entering the section sees the modification done. Synchronized statements must specify the object that provides the intrinsic lock.

The current versions of Java, back up diminutive operations of several datatypes with methods such as getAndDecrement(), getAndIncrement() and getAndSet(). An constructive way to control data sharing amidst threads is to share only immutable information amidst threads. A class is made immutable by marker all its fields as final and declaring the class as concluding.

A Thread in the java.lang.Thread class executes an object of type java.lang.Runnable. The java.util.concurrent package provides better support for concurrency than the Thread course. This package reduces the overhead for thread creation and prevents also many threads overloading the CPU and depleting the bachelor storage. A thread puddle is a collection of Runnable objects and contains a queue of tasks waiting to become executed.

Threads can communicate with one another via interrupts. A thread sends an interrupt past invoking an interrupt on the Thread object to the thread to be interrupted. The thread to be interrupted is expected to back up its own break. Thread.sleep causes the current thread to suspend execution for a specified menstruation.

The executor framework works with Runnable objects which cannot return results to the caller. The alternative is to employ coffee.util.concurrent.Callable. A Callable object returns an object of type java.util.concurrent.Hereafter. The Time to come object can be used to check the condition of a Callable object and to think the outcome from information technology. Yet, the Future interface has limitations for the asynchronous execution and the CompletableFuture extends the functionality of the Future interface for asynchronous execution.

Non-blocking algorithms based on low-level diminutive hardware primitives such every bit compare-and-bandy (CAS) are supported past Coffee 5.0 and later versions. The fork-join framework introduced in Java seven supports the distribution of piece of work to several workers then waiting for their completion. The bring together method allows one thread to wait for completion of another.

FlumeJava. A Java library used to develop, examination, and run efficient information parallel pipelines is described in [92]. FlumeJava is used to develop data parallel applications such as MapReduce discussed in Section vii.five.

At the heart of the organisation is the concept of parallel collection which abstracts the details of data representation. Data in a parallel collection can be an in-memory data structure, one or more files, BigTable discussed in Section 6.nine, or a MySQL database. Information-parallel computations are implemented by limerick of several operations for parallel collections.

In turn, parallel operations are implemented using deferred evaluation. The invocation of a parallel operation records the operation and its arguments in an internal graph structure representing the execution plan. Once completed, the execution plan is optimized.

The nearly important classes of the FlumeJava library are the P c o l fifty e c t i o n < T > used to specify a immutable handbag of elements of type T and the P T a b l e < K , V > representing an immutable multi-map with keys of type Yard and values of type V. The internal state of a P C o l l east c t i o north object is either deferred or materialized, i.e. non yet computed or computed, respectively. The P O b j due east c t < T > class is a container for a unmarried Java object of type T and can be either deferred or materialized.

p a r a l 50 east l D o ( ) supports chemical element-wise computation over an input P C o l l e c t i o n < T > to produce a new output P C o fifty l e c t i o n < S > . This archaic takes every bit the main argument a D o F n < T , Due south > , a function-like object defining how to map each value in the input into zero or more values in the output. In the following example from [92] c o fifty l e c t i o n O f ( s t r i due north g due south ( ) ) specifies that the p a r a l fifty e 50 D o ( ) operation should produce an unordered P C o l l e c t i o n whose Southward t r i n g elements should be encoded using UTF-eight v

Other archaic operations are g r o u p B y M e y ( ) , c o one thousand b i n due east V a l u eastward due south ( ) and f l a t t e n ( ) .

g r o u p B y Grand eastward y ( ) converts a multi-map of type P T a b l e < K , 5 > . Multiple key/value pairs may share the same fundamental into a uni-map of blazon P T a b l e < Grand , C o l l e c t i o north < V > > where each key maps to an unordered, plain Coffee Drove of all the values with that central.

c o m b i n east V a fifty u e s ( ) takes an input P T a b l e < K , C o fifty l eastward c t i o due north < V > > and an associative combining function on Vs, and returns a P T a b l e < K , 5 > where each input collection of values has been combined into a single output value.

f fifty a t t e due north ( ) takes a list of P C o l l e c t i o n < T > southward and returns a single P C o fifty l e c t i o north < T > that contains all the elements of the input P C o l l eastward c t i o north s .

Pipelined operations are implemented past concatenation of functions. For example, if the output of function f is practical every bit input of function g in a P a r a l fifty e l D o operation then two P a r a 50 50 due east 50 D o compute f and f g . The optimizer is only concerned with the construction of the execution program and not with the optimization of user-defined functions.

FlumeJava traverses the operations in the plan of a batch application in forward topological order, and executes each operation in turn. Independent operations are executed simultaneously. FlumeJava exploits not simply the task parallelism simply also the data parallelism within operations.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780128128107000042

Database Machines

Catherine M. Ricardo , in Encyclopedia of Information Systems, 2003

2. Functions of a Database Car

In a traditional database environment, a general-purpose computer is used to run the database management system (DBMS), too as a diverseness of other software and applications under its operating system. The database files reside on a disk that is under the computer'due south command. When a user or application program requests data, the computer processes the request and manages the disk controllers to access the information files. In a database machine environment, the general-purpose computer, chosen the host, does not run the DBMS software. Instead the DBMS runs on the database automobile, a separate computer that controls the devices on which the database files reside. When a user or program requests data access, the request is submitted to the host, which passes it to the database machine for processing. The dedicated automobile then performs the post-obit functions:

Accepts the information request and identifies which stored records will be needed to satisfy the request

Checks that the user is authorized to access those items and to perform the requested operations on them

Chooses the best path for data access

Performs concurrency control so that other information requests submitted at the same fourth dimension do not cause errors; this is necessary if at least one of the requests is an update

Handles the recovery subsystem, to ensure that the database can be restored to a right state in the issue of a transaction or system failure

Maintains data integrity, checking that no integrity constraints are violated

Directs the actual data access using its device controllers

Handles data encryption, if used

Formats the retrieved data, if any

Returns the data or results to the host machine

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122272404000277