Top PDF The Design of a Custom 32-Bit SIMD Enhanced Digital Signal Processor

The Design of a Custom 32-Bit SIMD Enhanced Digital Signal Processor

The Design of a Custom 32-Bit SIMD Enhanced Digital Signal Processor

For a number of years, the hardware industry has seen a drastic rise in embedded appli- cations. Thanks to the Internet of Things (IoT) revolution, a majority of these embed- ded applications are shifting towards the usage of simple hardware capable of running on batteries, while being able to handle complex data and implement complex algorithms. Translating these requirements to digital design terms, the hardware is expected to have high power efficiency, be tiny and simple enough, while being capable of meeting real- time constraints and process mathematical algorithms. Looking at some of the modern DSPs, most of them have been targeting high performance and wider applications, usually resulting in higher power consumption and complex hardware.
Show more

129 Read more

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Present era of SOC’s comprise analog, digital and mixed signal components housing on the same chip. In this environment processor plays a vital role. As the technology shrinking to sub-micrometer technology node, there exists a huge-scope of undesirable hazards in processors. These hazards may lead to disturbance in area, power and timing which deviate from desired quantities. Our paper focuses mainly to solve some of these issues. In-order to tackle these problems, we are introducing the enhanced version of MIPS. Microprocessor without interlocked pipeline stages (MIPS) is a recent architecture into the semi-conductor industry. This paper totally concentrates on designing the architecture in Verilog HDL.
Show more

5 Read more

Design of 32 bit MAC Unit for Complex Numbers in VHDL

Design of 32 bit MAC Unit for Complex Numbers in VHDL

This paper describes the 32*32 bit MAC Unit designed by using DADDA Multiplier algorithm. In recent year, Multiply Accumulate (MAC) unit developing for various high performance application.MAC unit performs multiplication and accumulation process. DADDA Multiplier has been used in the MAC Unit and comparison done based on the power, speed and area. This proposed method is to achieve low propagation delay, resource utilization and to increase the speed of processor. Basic MAC unit consist of multiplier, adder and accumulator. Speed of the multiplier is very important to any digital signal processors (DSPs).In this paper, 32-bit DADDA Multiplier and 64-bit carry look ahead adder are used. Carry look ahead adder (CLA) are widely used because of the solves this problem by calculating the carry signal in advance based on the input signal. The proposed research work is coded in VHDL & analysis in terms of speed, power & area is done on using Xilinx ISE 13.1 tool.
Show more

5 Read more

Design of a High Speed 32 Bit Parallel Hybrid Adder for Digital Arithmetic System

Design of a High Speed 32 Bit Parallel Hybrid Adder for Digital Arithmetic System

Abstract—Addition is a heavily used basic fundamental arithmetic operation that figures prominently in any digital logic system, digital signal processor, control system and scientific applications. Addition is a very hardware intensive subject and one as users are mostly concerned with getting low smaller area and higher speed. In ALU, adders play a major role not only in addition but it also performing many other basic arithmetic operations like subtraction, multiplication, etc. Hence, realization of an efficient adder is required for better performance of an ALU and therefore the processor. This paper presents the design of 32-bit Parallel Hybrid Adder architectures consists of Ripple Carry Adder, Carry Look Ahead Adder and Carry Select Adder. The time delay and area have been analyzed. Results will show the variation of area and speed for different designs. The designed adder consists of parallel implementation of 8-bit Ripple Carry Adder and 8-bit Carry Look Ahead Adder together to form 32-bit Parallel Hybrid Adder. The 32-bit Parallel Hybrid Adder is synthesized for XC3S1600 of Spartan-3E FPGAs implemented in 90nm technology.
Show more

9 Read more

A 32-Bit Risc Processor For Convolution Application

A 32-Bit Risc Processor For Convolution Application

Digital signal processing applications widely use convolution as an important operation, many algorithms have been proposed in order to improve the potential of the filters used. The contemplated RISC processor follows Von Neumann architecture and the processor is non-pipelined, having load store architecture and 32 bit instruction format. The processor possess arithmetic instructions, logical instructions, instructions which operate on data directly, instructions which pause the processing until the next interrupt is accessed, jump instructions, load and store instructions. The proposed design has high speed, low power and area efficiency.
Show more

6 Read more

Title: 32-Bit RISC and DSP System Design in an FPGA

Title: 32-Bit RISC and DSP System Design in an FPGA

ABSTRACT: Reduced Instruction Set Computer (RISC) cores use fewer instructions with simple constructs, and therefore they can be executed much faster within the CPU without having to use memory as often. When combined with a digital signal processor system (DSP), they can perform several operations quickly and efficiently. Here, the project present a system with RISC and DSP that uses very high-density logic (VHDL) and a field-programmable gate array (FPGA) to improve speed and functionality. This offers a variety of features, including arithmetic operations and Fourier transform. The design will be useful in several areas, including Android phones.
Show more

8 Read more

Development of single board computer based on 32-bit 5-stage pipeline RISC processor

Development of single board computer based on 32-bit 5-stage pipeline RISC processor

In 21 st century, embedded system design is a popular alternative to typical microprocessor design as it takes advantage of application characteristics to optimize its design for adequate performance at lower cost. Single Board Computer is a standalone digital system which capable to perform logical computation and data manipulation. Single Board Computer has CPU (Central Processing Unit), memory controller hub and I/O devices controller hub (interface chip) embedded to a single platform such as SoC (System-on-Chip) and embedded system. It is an economical and portable digital system with optimum logic gates and devices utilization. Single Board Computer has capability to synchronize data transfer between CPU and I/O peripheral devices, perform CPU operation as well as running program coded in machine code that utilize all its interfacing hardware devices. This thesis proposes a design of Single Board Computer in Verilog RTL, by extending from previous UTM student’s research on 32-bit 5-stage pipeline RISC processor, targeted at FPGA implementation in System-on-Chip (SoC) designs. ISA (Instruction Set Architecture) of RISC(Reduced Instructions Set Computer) processor is enhanced to cover control instruction. I/O controllers are designed to support insertion of input data and display of output data. This Single Board Computer is designed in compact form and generalized to comply with RISC CPU specifications and some basic I/O protocols, which will be a valuable asset in UTM soft core IP bank as to help in its future SoC researches.
Show more

22 Read more

MasPar MP-1: An SIMD Array Processor

MasPar MP-1: An SIMD Array Processor

A variety of compression techniques, including both split and merge-based vector quantizer, JPEG, and subband coding offer similar image compression performance. Many papers reviewed utilize special purpose hardware or serial algorithms to implement VQ. Compression ratios in the literature range from 4:l to 40:l. The implementations presented here for the MasPar MP-1 [10] demonstrate the feasibility of using a commercially available, massively parallel SIMD machine for codebook generation and encoding and decoding of images. Because of the large number of processors available on the MP-1, and the parallel nature of many parts of the algorithms, the encoding, decoding, and codebook generation execution times obtained are very low. An advantage of using a massively parallel system rather than special purpose hardware is flexibility; e.g., codebook and codeword sizes can be easily changed, and pre- and post-processing routines can be performed using the same system.
Show more

7 Read more

Design and Simulation of a Modified 32-bit ROM-based Direct Digital Frequency Synthesizer on FPGA

Design and Simulation of a Modified 32-bit ROM-based Direct Digital Frequency Synthesizer on FPGA

Surely, there is π/2 phase difference between the sine and the cosine functions, noting the fact that just one coarse LUT is needed in the trigonometric phase mapping technique to generate both the sine and the cosine coarse values. In this research, a ROM with two address lines and two outputs is used. In the proposed structure, finally, the values of the sine function between 0 to π/2 are stored in the coarse LUT. The N- bit phase value is separated to N/2+1 bits as integer part and N/2-1 bits as fraction part, so those two MSB bits of the integer part of the phase are used as enabled bits of the complementers to regulate the address lines and the outputs of the ROM corresponding to the each cycle, as figure 7 illustrates. For instance, the address line 1 at the first quarter cycle will be remained unchanged because the first quarter cycle of the sine samples is stored but the address line 2 will be complemented because of π/2 phase difference between the sine and the cosine samples. Address line 1’s complement is used in this work for complementing that can be implemented simply using XOR. Of course, ½ LSB (1/2 of phase accuracy) phase shift should be applied for the sine samples because otherwise 0 and π/2 will have the same address when using 1’s complement that causes an error to affect the SFDR. To produce errorless minus half cycle, -1/2 LSB offset is included for amplitude values. Similarly, fine LUT optimization method is used and the size of the each fine memory location can be reduced from 22 bits to 14 bits without affecting the SFDR. Figure 7 illustrates the main structure (coarse part) of the proposed method and Figure 8 shows the proposed architecture block diagram that is simulated by Simulink Matlab.
Show more

7 Read more

Compiler Optimization for SIMD type Vector Processor

Compiler Optimization for SIMD type Vector Processor

Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously [5,6]. A goal of compiler and processor designers is to identify and take advantage of as much ILP as possible. Ordinary programs are typically written under a sequential execution model where instructions execute one after the other and in the order specified by the programmer. ILP allows the compiler and the processor to overlap the execution of multiple instructions or even to change the order in which instructions are executed [7].
Show more

8 Read more

Synthesization of Low Power Digital Signal Processor Architecture

Synthesization of Low Power Digital Signal Processor Architecture

Abstract: A Wireless Sensor Networks spatially distributed autonomous sensors to monitor physical or environmental conditions, such as temperature, sound, pressure, etc. Radio communication exhibits the highest energy consumption in wireless sensor nodes. This paper describes the design of the newly proposed folded-tree architecture for on-the-node data processing in wireless sensor networks, using parallel prefix operations and data locality in hardware.

10 Read more

A programming system for parallel digital signal processor networks

A programming system for parallel digital signal processor networks

The flow-graph is then partitioned using mean field annealing, and the nodes assigned to each partition are scheduled to achieve max- imum processor utilization.. Finally, a C program fo[r]

77 Read more

A digital filter using the Intel 2920 signal processor

A digital filter using the Intel 2920 signal processor

Therefore, the main focus of this research was to study the feasibility of using the Intel Corporation 2.920 signal processing integrated circuit to filter digitally visual evoked respon[r]

88 Read more

FPGA Synthesis of 32 bit MIPS based Pipelined  RISC Processor with UART Interface

FPGA Synthesis of 32 bit MIPS based Pipelined RISC Processor with UART Interface

Abstract: The objective of this paper is to design and implement the 32-bit MIPS (an acronym for Microprocessor without Interlocked Pipeline Stages) based RISC (an acronym for Reduced Instruction Set Computer) processor with the UART (an acronym for Universal Asynchronous Receiver Transmitter) using Verilog Hardware Description Language (HDL). The proposed processor is designed using Von Neumann architecture, deploying unified instruction and data memory. The salient feature of the processor includes pipelining, used mainly for increasing the performance of the processor (at least one instruction is executed every cycle). Nowadays most of the computers (mini/micro/super) and microcontrollers use serial data communication for information exchange. This kind of communication interface, which transmits and receives the serial data is commonly known as UART. The proposed processor has a serial input port, serial output port, and 32 32-bit general purpose registers (for faster instruction fetch). The processor along with the embedded UART is synthesized on the Xilinx Spartan 3E CP-132 FPGA Starter board with 0.0517s instruction cycle and the desired operation is observed. Another advantage of the proposed processor is that it is able to execute programs with up to a large number of instructions so that any practical program can be fitted into it.
Show more

10 Read more

Architectural Design of 32 Bit Polar Encoder

Architectural Design of 32 Bit Polar Encoder

The four folded parallel pipelined structure for 32 bit polar encoder is shown in Figure 4. It consists of 10 func- tional units and 28 delay elements. Each stage has two functional units. Stages 1 and 2 include no delay ele- ments. Stages 3, 4 and 5 have several multiplexers placed in front of each functional unit to configure the inputs of the functional units. The proposed architecture continuously processes four samples/cycle, according to fold- ing sets and register allocation table. In this, the inputs are in the natural order and the outputs are in the bit re- versed order.
Show more

11 Read more

An Optimum Vlsi Design Of A 32-Bit Alu

An Optimum Vlsi Design Of A 32-Bit Alu

The execution standards of every VLSI design are defined by few fundamental factors. Those factors can be logic delay, wattage and space occupied by microchip. Variety of system syntaxes are based on execution standards. All the mathematical transactions are performed by calculator. Adder component acts as calculator to perform mathematical transactions in an ALU. This project details 32-bit ALU VLSI architecture. Different set of configurations of adder are explored in detail. The configuration that meets the execution standard is used for ALU architecture. Lastly 32-bit ALU architecture is completed by making use of mixed logic techniques i.e CMOS technique is preferred to organize fundamental digital functions, pseudo NMOS technique is preferred to organize AND gate and pass transistor technique is preferred to organize multiplexers. ALU is organized and simulated in HDL (Model simulator). Further the code is loaded on FPGA Spartan 3E kits for real time realization.
Show more

7 Read more

Design and Simulation of 32 Bit Block Processor 
Kunam Bhagya Lakshmi, Sapati Upender & Dr K Manjunathachari

Design and Simulation of 32 Bit Block Processor Kunam Bhagya Lakshmi, Sapati Upender & Dr K Manjunathachari

Processors are regarded as one of the most important devices in our everyday machines called computers. Before we start, we need to understand what exactly processors are and their appropriate implementations. Processor is an electronic circuit that functions as the central processing unit (CPU) of a computer, providing computational control. Processors are also used in other advanced electronic systems, such as computer printers, automobiles, and jet airliners, Calculators and etc. Typical processors incorporate arithmetic and logic functional units as well as the associated control logic, instruction processing circuitry, and a portion of the memory hierarchy. Portions of the interface logic for the input/output (I/O) and memory subsystems may also be infused, allowing cheaper overall systems.
Show more

6 Read more

DIGITAL FX!32 Running 32-Bit x86 Applications on Alpha NT

DIGITAL FX!32 Running 32-Bit x86 Applications on Alpha NT

The enabler allocates a small area of virtual memory in the address space of the subject by starting a suspended thread (CreateRemoteThread) and using its stack. It changes the protection of that memory to executable, readable, and writable (WriteProcessMemory). It then copies a small piece of code and data into the subject (WriteProcessMemory). The code that it copies simply calls LoadLibrary to load the DIGITAL FX!32 agent DLL and then returns. Note that the code built by the enabler must know the location of the LoadLibrary routine in the subject’s virtual address space. Fortunately, NT arranges for the system DLLs (including KERNEL32.DLL, which contains LoadLibrary) to be at the same virtual address in all processes on the system. Hence, the enabler can just use the address of LoadLibrary in its own address space. The data written to the subject’s memory contains the pointer to LoadLibrary and the full path name of the agent DLL. The enabler then creates a thread of execution in the subject and passes it the address of that data (CreateRemoteThread), raises its priority (SetThreadPriority), and waits for the thread to finish. If all goes well, the subject thread runs and loads the agent DLL into its address space. The agent’s main routine is called automatically, and it goes about its work of enabling the subject process.
Show more

8 Read more

Implementation of 32 bit Floating Point Multiplier and Adder for FFT Processor Using VHDL

Implementation of 32 bit Floating Point Multiplier and Adder for FFT Processor Using VHDL

“ Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Precision Format”. This paper shows the design and simulation of the 32 bit single precision floating point multiplier using VHDL. In this paper pipelined architecture is used to increase the speed and performance of the adder. In this paper IEEE -754 single precision format is used. The floating point adder is synthesize using Xilinx ISE software and simulated in ISE simulator. [3]

6 Read more

32-BIT MAC UNIT DESIGN USING VEDIC MULTIPLIER

32-BIT MAC UNIT DESIGN USING VEDIC MULTIPLIER

Multiplication Accumulation is an important part of real-time digital signal processing (DSP) with applications ranging from digital filtering to image processing. Multiply and accumulate is a very common basic-level operation seen in many DSP designs/algorithms. Two numbers are multiplied together, and added into an accumulator register. As shown in Fig.4, the basic MAC unit consists of multiplier, adder and accumulator. In general MAC unit uses the conventional multiplier unit, which consists of multiplication of multiplier and multiplicand based on adding the generated partial products and to compute the final multiplication. This results to adding the partial products. The key to the proposed MAC unit is to enhance the performance of MAC using Vedic Multiplier and to compare the Vedic, Booth and conventional multiplier in terms of computation required to generate the partial products and add the generated partial products to get the final result of the multiplication.
Show more

7 Read more

Show all 10000 documents...