In this paper we have proposed the design of a DSP microcontroller where the processor load is significantly reduced by relegating the math-intensive DSP algorithm to dedicated transform computing modules. It is shown that the use of such transform modules will facilitate scalability, reusability and flexibility for implementation of wide varieties of DSP functionalities. Moreover, it would be possible to meet the need of real-time DSP performance through high throughput computation of orthogonal transforms by pipelining and parallel processing. Use of additional data storage and dedicated buses for DSP functionalities would avoid any possible resource sharing conflicts. The proposed architecture makes only incremental modification to the instruction set of conventional microcontroller. Therefore DSP hardware of the proposed structure may also be used as pluggable core to be used with a microcontroller when DSP algorithms are required to be implemented.