Microprocessor design is experiencing a shift away from a predominant focus on pure performance to a balanced approach that optimizes for power as well as performance. Multi-core processors continue this trend and are capable of sharing work and executing tasks on independent execution cores concurrently. In many cases, taking full advantage of the performance benefits of these processors will require developers to thread their applications. This goal of this paper is to provide an understanding of multi-core architecture, focusing on various shared-memory threading techniques, such as OpenMP. Some common challenges involved in threading, such as data races & cache conflicts, are also discussed. Finally, tools support and practical techniques available to assist with stability and performance are covered.