<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7454633672335926364</id><updated>2011-08-03T00:36:36.627-07:00</updated><title type='text'>aydozz's technical</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>19</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-8189273558962646909</id><published>2010-10-20T13:27:00.000-07:00</published><updated>2010-10-20T13:38:44.316-07:00</updated><title type='text'>Unintended std::vector blues</title><content type='html'>&lt;span style="font-style:italic;"&gt;std::vector&lt;/span&gt; sometimes has rather obscure semantics. For example, when you're writing generic code with MPI and you use vector to store some local data that needs to be sent/received by another processor, the usual trick of retrieving the beginning address of the internal array by &lt;span style="font-style:italic;"&gt;&amp;my_vector[0]&lt;/span&gt; will fail miserably when T=bool&lt;br /&gt;&lt;br /&gt;This post is about another obscurity. Namely the &lt;span style="font-style:italic;"&gt;reserve()&lt;/span&gt; method. Unless you use push_back()'s, this is rarely useful. Here is why:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;vector &amp;lt; T &gt; v, w;&lt;br /&gt;v.reserve(n);&lt;br /&gt;for(int i=0; i&amp;lt;n; ++i)   v[i] = (T) i;&lt;br /&gt;w.swap(v);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now, guess what? You have two completely empty vectors. The reserve only changed capacity, and assignments with &lt;span style="font-style:italic;"&gt;operator[]&lt;/span&gt; did not need to resize as well since capacity was enough (imho, it should update size whenever an unused location in the vector starts being used, but that's hard to ensure in practice). Therefore, v is zeros-sized even after the for-loop.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-8189273558962646909?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/8189273558962646909/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2010/10/unintended-stdvector-blues.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/8189273558962646909'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/8189273558962646909'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2010/10/unintended-stdvector-blues.html' title='Unintended std::vector blues'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-8478612594519664236</id><published>2010-10-19T23:58:00.000-07:00</published><updated>2010-10-20T00:13:25.551-07:00</updated><title type='text'>Intrusive Data Structures</title><content type='html'>This is about something that standard CS curriculums rarely teach. We always think of data structures as synonyms to containers. We "insert" something, we "delete", or we traverse that "bag" of objects. Then at some point, we figure out that for certain big enough elements, copying data to the container costs more than the actual operations on the container. Later, we go even more crazy and try the discouraged: inserting pointers to the container instead of the actual objects. This leaves us with terrible memory management and long hours of debugging. &lt;br /&gt;&lt;br /&gt;An alternative is to think of "intrusive" data structures. As the simplest example, think about a linked list. Instead of inserting the object to a bigger ListNode object like this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;template &amp;lt;typename Obj&gt;&lt;br /&gt;struct ListNode&lt;br /&gt;{&lt;br /&gt;Obj object;&lt;br /&gt;ListNode * next;&lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We can embed the next pointer inside each object inside the list.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;struct Object&lt;br /&gt;{&lt;br /&gt;Object * next;&lt;br /&gt;...&lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Read on... for more information: &lt;a href="http://www.codefarms.com/publications/intrusiv/Intrusive.pdf"&gt;http://www.codefarms.com/publications/intrusiv/Intrusive.pdf&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-8478612594519664236?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/8478612594519664236/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2010/10/intrusive-data-structures.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/8478612594519664236'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/8478612594519664236'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2010/10/intrusive-data-structures.html' title='Intrusive Data Structures'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-4396089039314646644</id><published>2010-08-30T18:01:00.000-07:00</published><updated>2010-10-20T13:26:42.907-07:00</updated><title type='text'>Template specialization is explicit instantiation</title><content type='html'>Fact: As the title says, whenever you specialize a templated function, you explicitly instantiate it as well. &lt;br /&gt;&lt;br /&gt;What's the practical implication? You can just compile the specialization separately and link it to the main program. This property was invaluable in my case where I wanted to use different compilers for different parts of my code (hence I can not just include every template function).  &lt;br /&gt;&lt;br /&gt;How do you do it?&lt;br /&gt;&lt;br /&gt;Inside main.cpp:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;template &amp;lt;unsigned TT&gt;&lt;br /&gt;void foo(/*bunch of parameters*/) {};  // dummy base template &lt;br /&gt;&lt;br /&gt;template &amp;lt;&gt;&lt;br /&gt;void foo&amp;lt;2&gt;(/*bunch of parameters*/); // only the declaration&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Inside side.cpp&lt;br /&gt;&lt;span style="font-style:italic;"&gt;template &amp;lt;unsigned TT&gt;&lt;br /&gt;void foo(/*bunch of parameters*/);  // declaration of base template &lt;br /&gt;&lt;br /&gt;template &amp;lt;&gt;&lt;br /&gt;void foo&amp;lt;2&gt;(/*bunch of parameters*/)&lt;br /&gt;{ /* lots of code */ }&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-4396089039314646644?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/4396089039314646644/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2010/08/template-specialization-is-explicit.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/4396089039314646644'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/4396089039314646644'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2010/08/template-specialization-is-explicit.html' title='Template specialization is explicit instantiation'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-6966463812671191639</id><published>2009-12-30T17:36:00.000-08:00</published><updated>2010-01-14T11:16:34.009-08:00</updated><title type='text'>Debugging MPI</title><content type='html'>Parallel debugging has come a long way. Today, it's quite possible to use -X and remotely debug a parallel MPI application running on a supercomputer. TotalView and DDT are just two example debuggers, their IDEs are pretty familiar for a seasoned C/C++ programmer. &lt;br /&gt;&lt;br /&gt;However, debugging MPI is still painful because it rarely does what it claims it does. When there is a crash, you only get to see the crash. The MPI function in which the crash occurred will probably never return an error code because it will exit the application first. It is claimed that you can override this default behavior in C++ by setting error handlers to MPI::ERRORS_THROW_EXCEPTIONS, but I am yet to see that happen in practice. &lt;br /&gt;&lt;br /&gt;What's left for me is to peek into the internals of the MPI objects to see if any fields (or the whole object) are uninitialized. But sometimes, there is really not such a thing as an MPI object as we know it, it is just a handle. For example, here is the implementation in MPICH2 and MVAPICH2:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;&lt;blockquote&gt;&lt;br /&gt;// in file mpi.h&lt;br /&gt;typedef int MPI_Win;&lt;br /&gt;&lt;br /&gt;// in file mpiimpl.h&lt;br /&gt;typedef struct MPID_Win {&lt;br /&gt;    int           handle;             /* value of MPI_Win for this structure */&lt;br /&gt;    volatile int  ref_count;&lt;br /&gt;    int fence_cnt;     /* 0 = no fence has been called; 1 = fence has been called */ &lt;br /&gt;    MPID_Errhandler *errhandler;  /* Pointer to the error handler structure */&lt;br /&gt;    void *base;&lt;br /&gt;    MPI_Aint    size;        &lt;br /&gt;    int          disp_unit;      /* Displacement unit of *local* window */&lt;br /&gt;    MPID_Attribute *attributes;&lt;br /&gt;    MPID_Group *start_group_ptr; /* group passed in MPI_Win_start */&lt;br /&gt;    int start_assert;            /* assert passed to MPI_Win_start */&lt;br /&gt;    MPI_Comm    comm;         /* communicator of window (dup) */&lt;br /&gt;    ...&lt;br /&gt;    char          name[MPI_MAX_OBJECT_NAME];  &lt;br /&gt;} MPID_Win;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;// In file mpicxx.h&lt;br /&gt;class Win  {&lt;br /&gt;  protected:&lt;br /&gt;    MPI_Win the_real_win;&lt;br /&gt;&lt;br /&gt;  public:&lt;br /&gt;    inline Win(MPI_Win obj) : the_real_win(obj) {}&lt;br /&gt;    inline Win(void) : the_real_win(MPI_WIN_NULL) {}&lt;br /&gt;    virtual ~Win() {}&lt;br /&gt;&lt;br /&gt;    Win(const Win &amp;obj) : the_real_win(obj.the_real_win){}&lt;br /&gt;    Win&amp; operator=(const Win &amp;obj) {&lt;br /&gt;      the_real_win = obj.the_real_win; return *this; }&lt;br /&gt;&lt;br /&gt;    // logical&lt;br /&gt;    bool operator== (const Win &amp;obj) {&lt;br /&gt;      return (the_real_win == obj.the_real_win); }&lt;br /&gt;    bool operator!= (const Win &amp;obj) {&lt;br /&gt;      return (the_real_win != obj.the_real_win); }&lt;br /&gt;    // C/C++ cast and assignment&lt;br /&gt;    inline operator MPI_Win*() { return &amp;the_real_win; }&lt;br /&gt;    inline operator MPI_Win() const { return the_real_win; }&lt;br /&gt;    Win&amp; operator=(const MPI_Win&amp; obj) {&lt;br /&gt;      the_real_win = obj; return *this; }&lt;br /&gt;...&lt;br /&gt;}&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;prior to any operation, the real objects are retrieved via these handles:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt; /* Convert MPI object handles to object pointers */&lt;br /&gt;    MPID_Win_get_ptr( win, win_ptr );&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;where&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;&lt;blockquote&gt;&lt;br /&gt;#define MPID_Win_get_ptr(a,ptr)        MPID_Get_ptr(Win,a,ptr)&lt;br /&gt;&lt;br /&gt;/* Convert handles to objects for MPI types that do _not_ have any predefined objects */&lt;br /&gt;#define MPID_Get_ptr(kind,a,ptr)     \&lt;br /&gt;{         \&lt;br /&gt;   switch (HANDLE_GET_KIND(a)) {     \&lt;br /&gt;      case HANDLE_KIND_DIRECT:      \&lt;br /&gt;          ptr=MPID_##kind##_direct+HANDLE_INDEX(a);   \&lt;br /&gt;          break;       \&lt;br /&gt;      case HANDLE_KIND_INDIRECT:     \&lt;br /&gt;          ptr=((MPID_##kind*)      \&lt;br /&gt;               MPIU_Handle_get_ptr_indirect(a,&amp;MPID_##kind##_mem)); \&lt;br /&gt;          break;       \&lt;br /&gt;      case HANDLE_KIND_INVALID:      \&lt;br /&gt;      case HANDLE_KIND_BUILTIN:      \&lt;br /&gt;      default:        \&lt;br /&gt;          ptr=0;       \&lt;br /&gt;          break;       \&lt;br /&gt;     }         \&lt;br /&gt;}&lt;/blockquote&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So what's the story behind this design?&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;MPI Opaque objects such as 'MPI_Comm' or 'MPI_Datatype' are specified by integers (in the MPICH2 implementation); the MPI standard calls these handles.  Out of range values are invalid; the value 0 is reserved. For most (with the possible exception of 'MPI_Request' for performance reasons) MPI Opaque objects, the integer encodes both the kind of object (allowing runtime tests to detect a datatype passed where a communicator is expected) and important properties of the object.  Even the 'MPI_xxx_NULL' values should be encoded so that different null handles can be distinguished.  The details of the encoding of the handles is covered in more detail in the MPICH2 Design Document. For the most part, the ADI uses pointers to the underlying structures rather than the handles themselves.  However, each structure contains an 'handle' field that is the corresponding integer handle for the MPI object.&lt;br /&gt;MPID objects (objects used within the implementation of MPI) are not opaque.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Here is the &lt;a href="http://phase.hpcc.jp/mirrors/mpi/mpich2/docs/mpich2.pdf"&gt;MPICH2 design document&lt;/a&gt; for the interested.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Anyway, if you were hopeless enough to look at the design document, you'd see what kind of a black magic is necessary to make sense out of it. This might make sense as it is automatic encapsulation of data, one doesn't even need to use classes and declare its members private, because no one can access that object without deciphering  what it is first. Fortunately, there are alternative MPI implementations, such as OpenMPI, that uses real (well, sort of) objects. Here is how MPI_Win is (partially) declared in OpenMPI:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;&lt;blockquote&gt;&lt;br /&gt;struct ompi_win_t {&lt;br /&gt;    opal_object_t w_base;&lt;br /&gt;    opal_mutex_t  w_lock;&lt;br /&gt;&lt;br /&gt;    /* Group associated with this window. */&lt;br /&gt;    ompi_group_t *w_group;&lt;br /&gt;&lt;br /&gt;    /* Information about the state of the window.  */&lt;br /&gt;    uint16_t w_flags;&lt;br /&gt;&lt;br /&gt;    /* Error handling.  This field does not have the "w_" prefix so that the OMPI_ERRHDL_* macros can find it, regardless of whether it's a comm, window, or file. */&lt;br /&gt;    ompi_errhandler_t                    *error_handler;&lt;br /&gt;    ompi_errhandler_type_t            errhandler_type;&lt;br /&gt;&lt;br /&gt;    /* displacement factor */&lt;br /&gt;    int w_disp_unit;&lt;br /&gt;&lt;br /&gt;    void *w_baseptr;&lt;br /&gt;    size_t w_size;&lt;br /&gt;&lt;br /&gt;    /** Current epoch / mode (access, expose, lock, etc.).  Checked by the argument checking code in the MPI layer, set by the OSC component.  Modified without locking w_lock. */&lt;br /&gt;    volatile uint16_t w_mode;&lt;br /&gt;&lt;br /&gt;    /* one sided interface */&lt;br /&gt;    ompi_osc_base_module_t *w_osc_module;&lt;br /&gt;};&lt;/blockquote&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Once I call &lt;span style="font-weight:bold;"&gt;Post&lt;/span&gt; on a &lt;span style="font-weight:bold;"&gt;Window&lt;/span&gt;, it's w_mode becomes 34 because it is 22 in hexadecimal which means it has been posted and the exposure epoch has actually started. And after I successfully call &lt;span style="font-weight:bold;"&gt;Start&lt;/span&gt; on the same window, its w_mode because 99 which 63 in hexadecimal, telling me that ACCESS_EPOCH and STARTED are now also true.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;&lt;br /&gt;&lt;blockquote&gt;/* mode */&lt;br /&gt;#define OMPI_WIN_ACCESS_EPOCH 0x00000001&lt;br /&gt;#define OMPI_WIN_EXPOSE_EPOCH 0x00000002&lt;br /&gt;#define OMPI_WIN_FENCE        0x00000010&lt;br /&gt;#define OMPI_WIN_POSTED       0x00000020&lt;br /&gt;#define OMPI_WIN_STARTED      0x00000040&lt;br /&gt;#define OMPI_WIN_LOCK_ACCESS  0x00000080&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Just to decipher more... (you should never proceed this far unless you are 100% sure that the it's the MPI implementation's bug, not yours)&lt;br /&gt;w_osc_module contains all the necessary function pointers implemented by the osc (One-Sided Component). &lt;br /&gt;For example the post function looks like the following:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;// in file osc_pt2pt_sync.c&lt;br /&gt;int ompi_osc_pt2pt_module_post(ompi_group_t *group, int assert, ompi_win_t *win)&lt;br /&gt;{&lt;br /&gt;    int i;&lt;br /&gt;    ompi_osc_pt2pt_module_t *module = P2P_MODULE(win);&lt;br /&gt;&lt;br /&gt;    OBJ_RETAIN(group);&lt;br /&gt;    ompi_group_increment_proc_count(group);&lt;br /&gt;&lt;br /&gt;    OPAL_THREAD_LOCK(&amp;(module-&gt;p2p_lock));&lt;br /&gt;    assert(NULL == module-&gt;p2p_pw_group);&lt;br /&gt;    module-&gt;p2p_pw_group = group;    &lt;br /&gt;&lt;br /&gt;    /* Set our mode to expose w/ post */&lt;br /&gt;    ompi_win_remove_mode(win, OMPI_WIN_FENCE);&lt;br /&gt;    ompi_win_append_mode(win, OMPI_WIN_EXPOSE_EPOCH | OMPI_WIN_POSTED);&lt;br /&gt;&lt;br /&gt;    /* list how many complete counters we're still waiting on */&lt;br /&gt;    module-&gt;p2p_num_complete_msgs +=&lt;br /&gt;        ompi_group_size(module-&gt;p2p_pw_group);&lt;br /&gt;    OPAL_THREAD_UNLOCK(&amp;(module-&gt;p2p_lock));&lt;br /&gt;&lt;br /&gt;    /* send a hello counter to everyone in group */&lt;br /&gt;    for (i = 0 ; i &lt; ompi_group_size(module-&gt;p2p_pw_group) ; ++i) {&lt;br /&gt;        ompi_osc_pt2pt_control_send(module, ompi_group_peer_lookup(group, i), &lt;br /&gt;                                                      OMPI_OSC_PT2PT_HDR_POST, 1, 0);&lt;br /&gt;    }&lt;br /&gt;    return OMPI_SUCCESS;&lt;br /&gt;}&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In reality, some of these functions do very minimal communication. For example, all &lt;span style="font-weight:bold;"&gt;Start&lt;/span&gt; does is to find the rank of each process within the specified group, store the indices and set the true/false flag in the active ranks table.&lt;br /&gt;&lt;br /&gt;One thing to notice is the P2P_MODULE() call that basically casts &lt;span style="font-weight:bold;"&gt;w_osc_module&lt;/span&gt; pointer to type &lt;span style="font-style:bold;"&gt;ompi_osc_pt2pt_module_t *&lt;/span&gt;, which has the following interesting components (ok, interesting from a general active-target synchronization perspective):&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;&lt;blockquote&gt;// inside ompi/mca/osc/pt2pt/osc_pt2pt.h&lt;br /&gt;struct ompi_osc_pt2pt_module_t {&lt;br /&gt;    /** Extend the basic osc module interface */&lt;br /&gt;    ompi_osc_base_module_t super;  // ABAB: See the ac-hoc inheritance? - Lovely !&lt;br /&gt;&lt;br /&gt;    /** pointer back to window */&lt;br /&gt;    ompi_win_t *p2p_win;&lt;br /&gt;&lt;br /&gt;    /** communicator created with this window */&lt;br /&gt;    ompi_communicator_t *p2p_comm;&lt;br /&gt;&lt;br /&gt;    /** control message receive request */&lt;br /&gt;    struct ompi_request_t *p2p_cb_request;&lt;br /&gt;&lt;br /&gt;    opal_list_t p2p_pending_control_sends;&lt;br /&gt;&lt;br /&gt;    /** list of ompi_osc_pt2pt_sendreq_t structures, and includes all requests for this access epoch that have not already been started.  p2p_lock must be held when modifying this field. */&lt;br /&gt;    opal_list_t p2p_pending_sendreqs;&lt;br /&gt;&lt;br /&gt;    /** list of unsigned int counters for the number of requests to a particular rank in p2p_comm for this access epoc.  p2p_lock must be held when modifying this field */&lt;br /&gt;    unsigned int *p2p_num_pending_sendreqs;&lt;br /&gt;&lt;br /&gt;    /** For MPI_Fence synchronization, the number of messages to send in epoch.  For Start/Complete, the number of updates for this Complete.  For lock, the number of messages waiting for completion on on the origin side.  Not protected by p2p_lock - must use atomic counter operations. */&lt;br /&gt;    volatile int32_t p2p_num_pending_out;&lt;br /&gt;&lt;br /&gt;    /** For MPI_Fence synchronization, the number of expected incoming messages.  For Post/Wait, the number of expected updates from complete. For lock, the number of messages on the passive side we are waiting for.  Not protected by p2p_lock - must use atomic counter operations. */&lt;br /&gt;    volatile int32_t p2p_num_pending_in;&lt;br /&gt;&lt;br /&gt;    /** Number of "ping" messages from the remote post group we've received */&lt;br /&gt;    volatile int32_t p2p_num_post_msgs;&lt;br /&gt;&lt;br /&gt;    /** Number of "count" messages from the remote complete group we've received */&lt;br /&gt;    volatile int32_t p2p_num_complete_msgs;&lt;br /&gt;&lt;br /&gt;   ...&lt;br /&gt;&lt;br /&gt;   /********************** PWSC data *************************/&lt;br /&gt;    struct ompi_group_t *p2p_pw_group;&lt;br /&gt;    struct ompi_group_t *p2p_sc_group;&lt;br /&gt;    bool *p2p_sc_remote_active_ranks;&lt;br /&gt;    int *p2p_sc_remote_ranks;&lt;br /&gt;&lt;br /&gt;   ...&lt;br /&gt;};&lt;br /&gt;typedef struct ompi_osc_pt2pt_module_t ompi_osc_pt2pt_module_t;&lt;/blockquote&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-6966463812671191639?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/6966463812671191639/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2009/12/debugging-mpi.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/6966463812671191639'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/6966463812671191639'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2009/12/debugging-mpi.html' title='Debugging MPI'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-6071755914711837724</id><published>2009-07-29T19:40:00.000-07:00</published><updated>2009-07-29T20:00:37.005-07:00</updated><title type='text'>Initializations in C++</title><content type='html'>Initialization has been a vague part of my C++ knowledge. Although I always knew that most user defined classes has appropriate constructors, so that I can expect reasonable behavior (e.g. std::string initializes to an empty string "" ), this hasn't been the case for built-in arrays of vectors. &lt;br /&gt;&lt;br /&gt;What do you expect from this code:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;float * test = new float[10];&lt;br /&gt;copy(test, test+10, ostream_iterator&amp;lt;float&gt;( cout, " ");&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Honestly, I expect anything to happen though it sometimes behaves reasonably and spits out zeros. As a matter of fact, valgrind will list hundreds of errors for this tiny peace of code, most of which are of the form &lt;span style="font-style:italic;"&gt;"Conditional jump or move depends on uninitialised value"&lt;/span&gt;. The way to make sure everything is zero initialized is to use std::fill or std::fill_n (but please, no memset) after the allocation. On the other hand, this is not always possible. &lt;br /&gt;&lt;br /&gt;Assume you have a templated function which takes an array of T's where T can be pretty much anything. If it is a non-POD (plain old data) class, then you can look at the documentation and make sure the default constructor does what you want it to do (in my case, assignment to zero). But how to handle built-in types without writing a specialization to cover them? std:fill trick won't work since it might not make sense to the non-POD class you're dealing with. &lt;br /&gt;&lt;br /&gt;Here is the &lt;a href="http://www.boost.org/doc/libs/1_39_0/libs/utility/value_init.htm#details"&gt; loophole &lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;float * test = new float[10] () ;    &lt;br /&gt;copy(test, test+10, ostream_iterator&amp;lt;float&gt;( cout, " ");&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Note the empty set of parantheses as the initializer, which makes the array elements default constructed and the C++ standard says that a default constructed POD type is zero-initialized. Neat, huh? (btw, valgrind reports 0 errors this time)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-6071755914711837724?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/6071755914711837724/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2009/07/initializations-in-c.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/6071755914711837724'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/6071755914711837724'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2009/07/initializations-in-c.html' title='Initializations in C++'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-6065392845119123617</id><published>2009-07-29T16:54:00.000-07:00</published><updated>2009-11-21T14:45:18.612-08:00</updated><title type='text'>SSE and auto-vectorization in g++</title><content type='html'>I have been implementing a SpMV with multiple right hand side vectors (and hence multiple left hand side vectors). For each A[i][j], that translated into performing (assuming that rhs &amp; lhs are represented as tuples of std::vectors):&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;get&amp;lt;0&gt;(lvecs)[i]  += A[i][j] * get&amp;lt;0&gt;(rvecs)[j]; &lt;br /&gt;get&amp;lt;1&gt;(lvecs)[i]  += A[i][j] * get&amp;lt;1&gt;(rvecs)[j]; &lt;br /&gt;get&amp;lt;2&gt;(lvecs)[i]  += A[i][j] * get&amp;lt;2&gt;(rvecs)[j]; &lt;br /&gt;get&amp;lt;3&gt;(lvecs)[i]  += A[i][j] * get&amp;lt;3&gt;(rvecs)[j]; &lt;br /&gt;...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I have initially thought about 3-4 different ways of implementing this. One is just to rely on the automatic vectorization capabilities of g++ and hint the compiler to unroll the loop using templates, as this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;template &amp;lt;int D, typename T&gt;&lt;br /&gt;void saxpy(T a, T * __restrict b, T * __restrict c)&lt;br /&gt;{&lt;br /&gt; for(int i=0; i&amp;lt;D; ++i)&lt;br /&gt; {&lt;br /&gt;  c[i] +=  a* b[i];&lt;br /&gt; } &lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int a = 2;&lt;br /&gt;int b[BETA];&lt;br /&gt;int c[BETA];&lt;br /&gt;saxpy&amp;lt;BETA&gt;(a, b, c);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Performing saxpy 1 million times with BETA=16 takes about 9.97 milliseconds on our Opteron and 14.35 milliseconds on my Macbook's Core2Duo processor. These are the best I got, by using the -O3 optimization.  &lt;br /&gt;&lt;br /&gt;Then I said, let's try to force some SSE through &lt;a href="http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Vector-Extensions.html#Vector-Extensions"&gt;gcc vector extensions&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;typedef int vpackedsi __attribute__ ((vector_size (BETA*sizeof(int))));   // 64-bytes, a full cache line !&lt;br /&gt;union ipackedvector &lt;br /&gt;{&lt;br /&gt;   vpackedsi v;&lt;br /&gt;   int f[BETA];&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;ipackedvector av, bv, cv;&lt;br /&gt;cv.v += av.v * bv.v; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Or course, the drawback here is that a also needs to be a vector since operations between scalar and vector types are not allowed (you'll get &lt;span style="font-style:italic;"&gt;error: invalid operands of types 'int' and 'int __vector__' to binary 'operator*'&lt;/span&gt;) but I assume the automatic vectorization in gcc does the same packing behind the covers (if it is using sse/simd at all)&lt;br /&gt;&lt;br /&gt;Unfortunately, the second approach was slower, 16.712 milliseconds on the Opteron and 24.18 milliseconds on the Core2Duo. Exactly 67% slower on both architectures (which kind of surprised me). After taking a look at the generated assembly code, I see that the latter approach uses SSE4 instructions such as &lt;a href="http://msdn.microsoft.com/en-us/library/bb514039.aspx"&gt;pmuldq&lt;/a&gt;, and the former uses operations on floating points such as addsd although the code is supposed to use integer. Maybe, it is using FPU pipelines for better performance, gcc must be smarter than I am.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-6065392845119123617?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/6065392845119123617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2009/07/sse-and-auto-vectorization-in-g.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/6065392845119123617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/6065392845119123617'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2009/07/sse-and-auto-vectorization-in-g.html' title='SSE and auto-vectorization in g++'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-1482294564294272529</id><published>2009-07-24T15:17:00.000-07:00</published><updated>2009-07-30T21:30:28.190-07:00</updated><title type='text'>C++ Performance for HPC, part #1</title><content type='html'>I deliberately chose this controversial title as it is one of the most debated issues. I also avoided saying C/C++ as no such language exists. C is C, and C++ is C++.  I hate the famous "do one simple run, measure it, draw far-reaching conclusions from that simple experiment" cycle. &lt;br /&gt;&lt;br /&gt;This basic example is about the debate on array of structs vs. struct of arrays. You'll hear that an array of structs approach is superior if you are accessing all the entries of the struct simultaneously. This is basically because when you access StrArray[i].first, the other entries of the struct (say StrArray[i].second and StrArray[i].third) will be fetched to the cache automatically and you'll likely to avoid a cache miss when you access them in the next line. However, this only makes sense if you're doing random access to the array. On the other hand, if you're simply streaming through the array, the pressure on the memory subsystem and the caches are simply the same. Because in that case, nothing that you bring to the cache is wasted, i.e. spatial locality is great when you access data sequentially. In that case, the winner of the battle between array of structs and struct of arrays will be determined by lots of other issues such as array length (are there any conflict misses?), cache sizes, replacement policy, compiler, etc. Here is my test:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;struct mystruct &lt;br /&gt;{&lt;br /&gt; double first;&lt;br /&gt; double second;&lt;br /&gt; double third;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;struct StrOfArrays&lt;br /&gt;{&lt;br /&gt; StrOfArrays(int size)&lt;br /&gt; {&lt;br /&gt;  firsts = new double[size];&lt;br /&gt;  seconds = new double[size];&lt;br /&gt;  thirds = new double[size];&lt;br /&gt; }&lt;br /&gt; ~StrArrays()&lt;br /&gt; {&lt;br /&gt;  DeleteAll(firsts, seconds, thirds);&lt;br /&gt; }&lt;br /&gt; double * firsts;&lt;br /&gt; double * seconds;&lt;br /&gt; double * thirds;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int * linear = new int[TESTSIZE];&lt;br /&gt;int * random = new int[TESTSIZE];&lt;br /&gt;__gnu_cxx::iota(linear, linear+TESTSIZE, 0);&lt;br /&gt;__gnu_cxx::iota(random, random+TESTSIZE, 0);&lt;br /&gt;random_shuffle(random, random+TESTSIZE);&lt;br /&gt;&lt;br /&gt;StrOfArrays strofarrays (TESTSIZE);&lt;br /&gt;mystruct * arrayofstrs = new mystruct[TESTSIZE];&lt;br /&gt;&lt;br /&gt;gettimeofday(&amp;tim, NULL);&lt;br /&gt;t1=tim.tv_sec+(tim.tv_usec/1000000.0);&lt;br /&gt;for(int i=0; i&amp;lt;TESTSIZE; ++i)&lt;br /&gt;{&lt;br /&gt; arrayofstrs[linear[i]].first += 1;&lt;br /&gt; arrayofstrs[linear[i]].second += 2;&lt;br /&gt; arrayofstrs[linear[i]].third += 3;&lt;br /&gt;}&lt;br /&gt;gettimeofday(&amp;tim, NULL);&lt;br /&gt;t2=tim.tv_sec+(tim.tv_usec/1000000.0);&lt;br /&gt;printf("%.6lf seconds elapsed for streaming array of structs\n", t2-t1);&lt;br /&gt;&lt;br /&gt;gettimeofday(&amp;tim, NULL);&lt;br /&gt;t1=tim.tv_sec+(tim.tv_usec/1000000.0);&lt;br /&gt;for(int i=0; i&amp;lt;TESTSIZE; ++i)&lt;br /&gt;{&lt;br /&gt; strofarrays.firsts[linear[i]] += 1;&lt;br /&gt; strofarrays.seconds[linear[i]] += 2;&lt;br /&gt; strofarrays.thirds[linear[i]] += 3;&lt;br /&gt;}&lt;br /&gt;gettimeofday(&amp;tim, NULL);&lt;br /&gt;t2=tim.tv_sec+(tim.tv_usec/1000000.0);&lt;br /&gt;printf("%.6lf seconds elapsed for streaming struct of arrays\n", t2-t1);&lt;br /&gt;&lt;br /&gt;gettimeofday(&amp;tim, NULL);&lt;br /&gt;t1=tim.tv_sec+(tim.tv_usec/1000000.0);&lt;br /&gt;for(int i=0; i&amp;lt;TESTSIZE; ++i)&lt;br /&gt;{&lt;br /&gt;        arrayofstrs[random[i]].first += 1;&lt;br /&gt;        arrayofstrs[random[i]].second += 2;&lt;br /&gt;        arrayofstrs[random[i]].third += 3;&lt;br /&gt;}&lt;br /&gt;gettimeofday(&amp;tim, NULL);&lt;br /&gt;t2=tim.tv_sec+(tim.tv_usec/1000000.0);&lt;br /&gt;printf("%.6lf seconds elapsed for randomly accessing array of structs\n", t2-t1);&lt;br /&gt;&lt;br /&gt;gettimeofday(&amp;tim, NULL);&lt;br /&gt;t1=tim.tv_sec+(tim.tv_usec/1000000.0);&lt;br /&gt;for(int i=0; i&amp;lt;TESTSIZE; ++i)&lt;br /&gt;{&lt;br /&gt;        strofarrays.firsts[random[i]] += 1;&lt;br /&gt;        strofarrays.seconds[random[i]] += 2;&lt;br /&gt;        strofarrays.thirds[random[i]] += 3;&lt;br /&gt;}&lt;br /&gt;gettimeofday(&amp;tim, NULL);&lt;br /&gt;t2=tim.tv_sec+(tim.tv_usec/1000000.0);&lt;br /&gt;printf("%.6lf seconds elapsed for randomly accessing struct of arrays\n", t2-t1);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Results on Opteron 8378:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;TESTSIZE=1 million&lt;/span&gt;&lt;br /&gt;0.013407 seconds elapsed for streaming array of structs&lt;br /&gt;0.009285 seconds elapsed for streaming struct of arrays&lt;br /&gt;0.050337 seconds elapsed for randomly accessing array of structs&lt;br /&gt;0.079531 seconds elapsed for randomly accessing struct of arrays&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;TESTSIZE=10 million&lt;/span&gt;&lt;br /&gt;0.135137 seconds elapsed for streaming array of structs&lt;br /&gt;0.129781 seconds elapsed for streaming struct of arrays&lt;br /&gt;0.579303 seconds elapsed for randomly accessing array of structs&lt;br /&gt;1.257762 seconds elapsed for randomly accessing struct of arrays&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;TESTSIZE=100 million&lt;/span&gt;&lt;br /&gt;1.348777 seconds elapsed for streaming array of structs&lt;br /&gt;0.917530 seconds elapsed for streaming struct of arrays&lt;br /&gt;12.087753 seconds elapsed for randomly accessing array of structs&lt;br /&gt;32.038711 seconds elapsed for randomly accessing struct of arrays&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;As seen, for basic streaming (sequential access), the claim of array of structs being superior is not true, at least on our Opteron. However, when we move to random access, you're better of opting for an array of structs approach. Translating this to real application: &lt;br /&gt;If you're doing a sparse matrix X dense vector multiply using triples (row_id, col_id, num_val), stick to your struct of arrays. &lt;br /&gt;If you're doing a sparse matrix X sparse matrix multiply, then you're better off with array of structs because this time you have a lot of random access.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-1482294564294272529?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/1482294564294272529/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2009/06/c-performance-for-hpc-part-1.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/1482294564294272529'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/1482294564294272529'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2009/06/c-performance-for-hpc-part-1.html' title='C++ Performance for HPC, part #1'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-1745121036669435553</id><published>2009-06-29T00:48:00.000-07:00</published><updated>2009-06-29T01:03:36.571-07:00</updated><title type='text'>Accessor functions</title><content type='html'>In C++, a temporary cannot be passed by non-const reference. If you are slightly inexperienced and (for the sake of encapsulation) writing your accessors like this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;class Object&lt;br /&gt;{&lt;br /&gt;....&lt;br /&gt;MPI::Intracomm GetRowWorld() { return rowWorld; }&lt;br /&gt;MPI::Intracomm GetColWorld() { return colWorld; }&lt;br /&gt;...&lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You'll be in deep trouble when you want to send these return values to a function. Such as this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;void foo(MPI::Intracomm &amp; rowcomm, MPI::Intracomm &amp; colcomm)&lt;br /&gt;{&lt;br /&gt;...&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;Object A;&lt;br /&gt;foo(  A.GetRowWorld(), A.GetColWorld());   &lt;br /&gt;// Won't compile (and you should be happy about it)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;So, what's the fix? You can try to change the signature of foo:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;void foo(const MPI::Intracomm &amp; rowcomm, const MPI::Intracomm &amp; colcomm);&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;You might think this is smarter as it compiles but now you are in even deeper trouble because your accessor functions return values and they go out of scope immediately. Therefore, when foo starts executing, the references are dangling. You might try to change the signature to this:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;void foo(MPI::Intracomm rowcomm, MPI::Intracomm colcomm);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This works, but hey it might be terribly inefficient if MPI::Intracomm is a big object because you're creating temporaries everytime you pass by value. Just leave the signature as is and go fix your class definition instead:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;class Object&lt;br /&gt;{&lt;br /&gt;....&lt;br /&gt;MPI::Intracomm &amp; GetRowWorld() { return rowWorld; }&lt;br /&gt;MPI::Intracomm &amp; GetColWorld()  { return colWorld; }&lt;br /&gt;MPI::Intracomm GetRowWorld() const { return rowWorld; }&lt;br /&gt;MPI::Intracomm GetColWorld() const { return colWorld; }&lt;br /&gt;...&lt;br /&gt;}&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-1745121036669435553?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/1745121036669435553/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2009/06/accessor-functions.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/1745121036669435553'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/1745121036669435553'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2009/06/accessor-functions.html' title='Accessor functions'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-8751117072785423846</id><published>2009-06-18T21:50:00.000-07:00</published><updated>2009-06-18T22:47:44.011-07:00</updated><title type='text'>static_cast and CRTP</title><content type='html'>I have been using CRTP (curiously recurring template pattern) for my Sparse Matrix classes for a number of reasons:&lt;br /&gt;&lt;br /&gt;1) Performance, as this is a high-performance computing issue.&lt;br /&gt;2) Parameter type checking without the need to inspect RTTI (run-time type information).&lt;br /&gt;&lt;br /&gt;Here is the deal:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;template &amp;lt; class DER &gt;&lt;br /&gt;class SpMat &lt;br /&gt;{ ... }&lt;br /&gt;&lt;br /&gt;template &amp;lt; class DER&gt;&lt;br /&gt;template &amp;lt;typename SR&gt;&lt;br /&gt;void SpMat&amp;lt;DER&gt;::SpGEMM(SpMat&amp;lt;DER&gt; &amp; A,  SpMat&amp;lt;DER&gt; &amp; B, ...)&lt;br /&gt;{&lt;br /&gt;// check for conformance, etc.&lt;br /&gt;...&lt;br /&gt;static_cast&amp;lt; DER* &gt;(this)-&gt;template PlusEq_AtXBt&amp;lt;SR&gt;(static_cast&amp;lt; DER &gt;(A), static_cast&amp;lt; DER &gt;(B));&lt;br /&gt;...&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;class SpDCCols: public SpMat &amp;lt;SpDCCols &gt;&lt;br /&gt;{&lt;br /&gt;public:&lt;br /&gt;template &amp;lt;typename SR&gt; &lt;br /&gt;int PlusEq_AtXBt(const SpDCCols &amp; A, const SpDCCols &amp; B);  &lt;br /&gt;....&lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This won't compile because &lt;span style="font-style:italic;"&gt;static_cast&amp;lt; DER &gt;(A)&lt;/span&gt; fails as no explicit conversion operator from &lt;span style="font-style:italic;"&gt;SpMat&amp;lt; SpDCCols &gt;&lt;/span&gt; to &lt;span style="font-style:italic;"&gt;SpDCCols&lt;/span&gt; is supplied, although the latter class is derived from the former. This also has to do with the semantics of static_cast() operator:&lt;br /&gt;Stroustrup says "For a built-in type T, T(e) is equivalent to static_cast(e)"&lt;br /&gt;&lt;br /&gt;So, are we gonna provide a conversion constructor? Of course not. One can convert a pointer/reference of a type A to a pointer/reference of a type B if A is a base class of B. &lt;br /&gt;Therefore, the immediate solution is to cast to a reference rather than a concrete object, which will also avoid unnecessary explicit conversion and make your code faster:&lt;br /&gt; &lt;span style="font-style:italic;"&gt;static_cast&amp;lt; DER &amp; &gt;(A)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-8751117072785423846?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/8751117072785423846/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2009/06/staticcast-and-crtp.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/8751117072785423846'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/8751117072785423846'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2009/06/staticcast-and-crtp.html' title='static_cast and CRTP'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-3179755217814186755</id><published>2009-06-16T20:11:00.001-07:00</published><updated>2009-06-16T21:00:45.087-07:00</updated><title type='text'>Template argument deduction</title><content type='html'>C++ has an excellent mechanism to deduce arguments. However, it is not free of pitfalls. Suppose you are writing a library function to perform a matrix-vector multiplication on a particular semiring (which is defined as a class with static ::multiply and ::add functions):&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;// Perform y &lt;- y + Ax  template &amp;lt; typename T, typename SR&gt;&lt;br /&gt;void MultiplyAdd(Matrix&amp;lt;T&gt; &amp;amp; A, Vector&amp;lt;T&gt; &amp;amp; x, Vector&amp;lt;T&gt; &amp;amp; y){&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;.... // usual conformance checks&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;for(int i=0; i&amp;lt; A.m; ++i){&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;for(int j=0; j&amp;lt; A.n; ++j){&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;temp = SR::multiply(A[i][j], x[j]);&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;y[i] = SR::add(temp, y[i]);&lt;br /&gt;}}} &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here,the template argument T can be deduced from the function arguments but SR can not. If you release your library like this, people will need to explicitly call it like:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;MultiplyAdd &amp;lt;int, minplus&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;which is kind of annoying. Normally, you'd like to list deducible arguments first so that the user can get away with declaring only the non-deducible ones:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;MultiplyAdd &amp;lt;minplus&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;----------------------&lt;br /&gt;&lt;br /&gt;My second point will be on using function pointers in STL library calls. If you are using a templated function pointer, don't expect the compiler to deduce the arguments for you. Example:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;template &amp;lt;typename T&gt;&lt;br /&gt;bool compare_second(pair&amp;lt;T,T&gt; t1, pair&amp;lt;T,T&gt; t2)&lt;br /&gt;{&lt;br /&gt;return t1.second &amp;lt; t2.second;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;const int N = 100;&lt;br /&gt;pair pairarray[N];&lt;br /&gt;.... // somehow populate the array&lt;br /&gt;sort(pairarray, pairarray +N, compare_second); // cryptic compiler error!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The correct way to do it is obviously:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;sort(pairarray, pairarray +N, compare_second&amp;lt;int&gt;);&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-3179755217814186755?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/3179755217814186755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2009/06/template-argument-deduction.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/3179755217814186755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/3179755217814186755'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2009/06/template-argument-deduction.html' title='Template argument deduction'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-2739049706959219494</id><published>2008-09-26T13:23:00.000-07:00</published><updated>2009-06-14T18:30:33.344-07:00</updated><title type='text'>How to get rid of dot/slash when running executables on Linux</title><content type='html'>... and why you shouldn't be doing this unless it is absolutely necessary.&lt;br /&gt;&lt;br /&gt;But anyway, sometimes there is this old/legacy program that tries to execute another program inside the same directory without the dot/slash, and we don't have access to its source code or we don't wanna access its source code (for reasons apparent to a CS person)&lt;br /&gt;&lt;br /&gt;Here: &lt;a href="http://www.linfo.org/dot_slash.html"&gt;http://www.linfo.org/dot_slash.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Have fun.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-2739049706959219494?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/2739049706959219494/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2008/09/how-to-get-rid-of-dotslash-when-running.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/2739049706959219494'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/2739049706959219494'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2008/09/how-to-get-rid-of-dotslash-when-running.html' title='How to get rid of dot/slash when running executables on Linux'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-2902757584473669793</id><published>2008-01-16T22:06:00.000-08:00</published><updated>2009-06-14T18:30:33.344-07:00</updated><title type='text'>Heap Memory in Multithreaded NUMA Programs</title><content type='html'>Ok, here is the deal: You want to parallelize you application (say matrix multiplication) and you want asynchronous computation. If you have a shared memory NUMA machine, threads seem to be the obvious choice. The caveat is memory management. Calls to "new" and "delete" are serialized because heap memory is always shared among threads. Also, and probably more costly, there are memory contentions, false sharing, etc. &lt;br /&gt;I tried to use &lt;a href="http://www.hoard.org"&gt;hoard library&lt;/a&gt; here is what happened:&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;Without hoard, using plain GNU C++ library (which might be using ptmalloc?)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Loading Matrices&lt;br /&gt;Loaded !&lt;br /&gt;Loading took 14.019800 seconds&lt;br /&gt;Multiplications started&lt;br /&gt;Transposition took 0.423375 seconds&lt;br /&gt;Multiplication took 4.816739 seconds&lt;br /&gt;Retransposition took 0.486832 seconds&lt;br /&gt;Multiplications finished&lt;br /&gt;8.713002 seconds elapsed (including thread creation cost)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Using hoard:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Loading Matrices&lt;br /&gt;Loaded !&lt;br /&gt;Loading took 2.489643 seconds&lt;br /&gt;Multiplications started&lt;br /&gt;Transposition took 0.463726 seconds&lt;br /&gt;Multiplication took 5.337530 seconds&lt;br /&gt;Retransposition took 0.493123 seconds&lt;br /&gt;Multiplications finished&lt;br /&gt;9.304911 seconds elapsed (including thread creation cost) &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;What's wrong here? Hoard makes loading matrices really fast (2.4 instead of 14 seconds), but it slowed down the multiplication at the same time (5.3 sec instead of 4.8).&lt;br /&gt;Note that the code uses 16 threads on a 16 core machine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-2902757584473669793?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/2902757584473669793/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2008/01/heap-memory-in-multithreaded-numa.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/2902757584473669793'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/2902757584473669793'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2008/01/heap-memory-in-multithreaded-numa.html' title='Heap Memory in Multithreaded NUMA Programs'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-902238399507713352</id><published>2008-01-14T11:03:00.000-08:00</published><updated>2009-06-14T18:30:33.344-07:00</updated><title type='text'>Boost::bind</title><content type='html'>Boost::bind library is a useful library that I usually use for my multithreaded code, even though it is not really required in any sense. Instead of writing my own function objects to be called by boost::thread, I delegate the job to boost::bind. &lt;br /&gt;However, there is a caveat and I want to share it with you today.&lt;br /&gt;&lt;br /&gt;"The arguments that bind takes are copied and held internally by the returned function object".&lt;br /&gt;So,that means the copy constructors are called for your arguments. If you intentionally used reference parameters to avoid data copying during function calls, using boost::bind blindly just nullifies your efforts :(&lt;br /&gt;&lt;br /&gt;Two solutions exist as far as I know:&lt;br /&gt;&lt;br /&gt;1) Plain and simple, pass pointers instead. Ok, this is too C-like.&lt;br /&gt;2) Force bind to hold a reference instead of calling the copy constructors and keeping copies of function arguments.boost::ref and boost::cref does that for you&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-902238399507713352?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/902238399507713352/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2008/01/boostbind.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/902238399507713352'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/902238399507713352'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2008/01/boostbind.html' title='Boost::bind'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-8936411176183398004</id><published>2008-01-13T13:23:00.000-08:00</published><updated>2009-06-14T18:30:33.344-07:00</updated><title type='text'>Analytical Geometry vs. Programming Languages</title><content type='html'>I guess anyone with a little bit of programming experience knows the storage of multidimensional arrays. For simplicity, I will just deal with 2D arrays. &lt;br /&gt;In C,C++,etc. memory storage is row-major ordered, meaning that rows are stored consecutively. A 3X4 grid would be stored in the memory as follows:&lt;br /&gt;&lt;br /&gt;1  2  3  4&lt;br /&gt;5  6  7  8&lt;br /&gt;9 10 11 12&lt;br /&gt;&lt;br /&gt;In Fortran or Matlab, that would be:&lt;br /&gt;&lt;br /&gt;1  5  9 &lt;br /&gt;2  6 10 &lt;br /&gt;3  7 11 &lt;br /&gt;4  8 12 &lt;br /&gt;&lt;br /&gt;So now, my old knowledge from analytical geometry forces me to think about the following axis system:&lt;br /&gt;&lt;br /&gt;y&lt;br /&gt; |&lt;br /&gt; |&lt;br /&gt; |&lt;br /&gt; |&lt;br /&gt; |&lt;br /&gt; ---------------- x &lt;br /&gt;&lt;br /&gt;That certainly doesn't fit to the row-major storage system. I have seen so many people with math background that would access the ith row and the jth column like A[j][i] in C++. Why? Because that is the geometric way to do it. By moving along the x-axis, you change the current column and the x-axis naturally seems to be the first axis to be written in row-major storage. &lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;A[x-axis value][y-axis value]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;But that's plain wrong, in fact all accesses are of the form A[y-axis value][x-axis value]. If you want to increase the current column you're in, you change the second dimension like A[i][j++]&lt;br /&gt;&lt;br /&gt;Weird, right?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-8936411176183398004?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/8936411176183398004/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2008/01/analytical-geometry-vs-programming.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/8936411176183398004'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/8936411176183398004'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2008/01/analytical-geometry-vs-programming.html' title='Analytical Geometry vs. Programming Languages'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-2160655777879871019</id><published>2007-02-23T03:08:00.000-08:00</published><updated>2009-06-14T18:30:33.344-07:00</updated><title type='text'>input/output operator overloading in C++</title><content type='html'>1) &lt;span style="font-style:italic;"&gt;operator&lt;&lt;&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;operator&gt;&gt;&lt;/span&gt; should not be member functions, otherwise you would need to use something like &lt;span style="font-style:italic;"&gt;s &lt;&lt; cout&lt;/span&gt; or &lt;span style="font-style:italic;"&gt;s &gt;&gt; cin&lt;/span&gt; instead of &lt;span style="font-style:italic;"&gt;cout &lt;&lt; s&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;cin &gt;&gt; s&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;2) You will need to make them "friends" since they need to access private members.&lt;br /&gt;&lt;br /&gt;3) Their return type needs to be Objects of type &lt;span style="font-style:italic;"&gt;ostream&amp;&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;istream&amp;&lt;/span&gt;. To see the reason, think about the following case:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;&lt;br /&gt;cout&lt;&lt; s &lt;&lt; "is the sparse matrix!" &lt;&lt; endl;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Since the value of &lt;span style="font-style:italic;"&gt;cout &lt;&lt; s&lt;/span&gt; is &lt;span style="font-style:italic;"&gt;cout&lt;/span&gt; (which is of type &lt;span style="font-style:italic;"&gt;ostream&lt;/span&gt;), both "is the sparse matrix!" and &lt;span style="font-style:italic;"&gt;endl&lt;/span&gt; can be subsequently inserted into cout.&lt;br /&gt;&lt;br /&gt;Thus there is the signature:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;class SparseMatrix&lt;br /&gt;{&lt;br /&gt;public:&lt;br /&gt; ...&lt;br /&gt; friend ostream&amp; operator&lt;&lt; (ostream&amp; out, const SparseMatrix&amp; s); &lt;br /&gt;        ...&lt;br /&gt;&lt;br /&gt;private:&lt;br /&gt; ...&lt;br /&gt;};&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-2160655777879871019?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/2160655777879871019/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2007/02/inputoutput-operator-overloading-in-c.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/2160655777879871019'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/2160655777879871019'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2007/02/inputoutput-operator-overloading-in-c.html' title='input/output operator overloading in C++'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-5226211919719470467</id><published>2007-02-20T23:30:00.000-08:00</published><updated>2009-06-14T18:30:33.344-07:00</updated><title type='text'>Search tree vs. Hashing</title><content type='html'>There is always this question that I had in mind. For instance if hashing support queries and inserts in amortized constant time, why the hell do we need AVL trees, red-black trees and so on?&lt;br /&gt;&lt;br /&gt;Obviously search trees can do some stuff more efficiently than hashing, for instance:&lt;br /&gt;1) Range queries&lt;br /&gt;2) Prefix searching (for instance with Tries)&lt;br /&gt;3) Error tolerant searching.&lt;br /&gt;&lt;br /&gt;For hashing, there is the extra storage (usually we need storage of size 2*n) problem but this also exists in search tree with the pointers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-5226211919719470467?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/5226211919719470467/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2007/02/search-tree-vs-hashing.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/5226211919719470467'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/5226211919719470467'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2007/02/search-tree-vs-hashing.html' title='Search tree vs. Hashing'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-6187274641652738992</id><published>2007-02-15T00:00:00.000-08:00</published><updated>2009-06-14T18:30:33.344-07:00</updated><title type='text'>Puzzles and Algorithms</title><content type='html'>This time, I would like to talk about a clever way of general algorithmic puzzle solving technique.&lt;br /&gt;&lt;br /&gt;My first example is the celebrity problem: A celebrity is someone who knows nobody but is known by everybody. In a group of n people with 1 celebrity among them, what is the minimum number of questions that you should ask to find the celebrity for sure. The questions are of the form: "Do you know THAT guy?"&lt;br /&gt;&lt;br /&gt;The second example is the managers-engineers puzzle: There are &lt;br /&gt;n people in the building, each either an engineer or a manager. The problem facing the FBI is to separate engineers from managers. Each of the  n people knows the status of all the others. The questions are of the form: "Is this person an engineer or a manager?" The engineers always tell the truth. What makes it hard is that the managers may not tell the truth. In fact, the managers are evil geniuses who are conspiring to confuse the interrogators. Under the assumption that more than half of the people are engineers, can you suggest a strategy for the FBI to find one engineer with at most n-1 questions? &lt;br /&gt;&lt;br /&gt;Answers are to be posted... The way will be to eliminate as much people as possible with a single question while still maintaining the problem invariant. &lt;br /&gt;Hint: Think it backwards ! Spotting an engineer or a celebrity may be hard. But it may be easier to spot a manager or a noncelebrity (respectively). Then we will have a subproblem...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-6187274641652738992?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/6187274641652738992/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2007/02/puzzles-and-algorithms.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/6187274641652738992'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/6187274641652738992'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2007/02/puzzles-and-algorithms.html' title='Puzzles and Algorithms'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-9134857945773031175</id><published>2006-10-25T22:37:00.000-07:00</published><updated>2009-06-14T18:30:33.345-07:00</updated><title type='text'>a note on generic array creation</title><content type='html'>&lt;span style="font-style:italic;"&gt;new Scalar[numberOfRows][numberOfCols]&lt;/span&gt;will not work !!!&lt;br /&gt;&lt;br /&gt;This is because the operator &lt;span style="font-style:italic;"&gt;new&lt;/span&gt; is a run-time operation and Java 1.5 removes all generics typing at compile-time. &lt;br /&gt;&lt;br /&gt;This is the well-known generic array creation, an operation not supported by Java 1.5.&lt;br /&gt;&lt;br /&gt;You'd better use &lt;br /&gt;&lt;span style="font-style:italic;"&gt;(Scalar[][]) new Object[numberOfRows][numberOfCols]&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-9134857945773031175?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/9134857945773031175/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2006/10/note-on-generic-array-creation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/9134857945773031175'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/9134857945773031175'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2006/10/note-on-generic-array-creation.html' title='a note on generic array creation'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7454633672335926364.post-5752022998822632227</id><published>2006-10-03T02:07:00.000-07:00</published><updated>2009-06-14T18:30:33.345-07:00</updated><title type='text'>Checksums Unrevealed</title><content type='html'>&lt;a href="http://www.cs.ucsb.edu/~aydin/checksum.htm"&gt;www.cs.ucsb.edu/~aydin/checksum.htm&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7454633672335926364-5752022998822632227?l=aydozz-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://aydozz-tech.blogspot.com/feeds/5752022998822632227/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://aydozz-tech.blogspot.com/2006/10/checksums-unrevealed.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/5752022998822632227'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7454633672335926364/posts/default/5752022998822632227'/><link rel='alternate' type='text/html' href='http://aydozz-tech.blogspot.com/2006/10/checksums-unrevealed.html' title='Checksums Unrevealed'/><author><name>aydozz</name><uri>http://www.blogger.com/profile/04650563446245995209</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='21' src='http://1.bp.blogspot.com/_c9eokIf6YNw/SjWjX5MOfHI/AAAAAAAABcs/UlFO_27EWOY/S220/Oxygen+007.jpg'/></author><thr:total>0</thr:total></entry></feed>
