Distributed Processing is increasingly an issue in Embedded Systems. As processors become smaller, cheaper, and more numerous, more products are built using multiple closely coupled 32 bit processors. The software architect is asked to design solutions that can be partitioned among several processors, then re-partitioned as hardware changes. This paper discusses the similarities to traditional distributed processing and points out the dissimilarities. It explores the consequences of these similarities and differences, referring to real world examples. This paper makes recommendations on how to design distributed applications that are performant, portable and reusable in a box.