From rifkin@cco.caltech.edu Sun Feb 13 05:15:41 1994 To: adam@vlsi.cs.caltech.edu Subject: C++ Faq 3 >Newsgroups: comp.lang.c++ >Path: nntp-server.caltech.edu!netline-fddi.jpl.nasa.gov!elroy.jpl.nasa.gov!swrinde!cs.utexas.edu!howland.reston.ans.net!europa.eng.gtefsd.com!news.umbc.edu!eff!news.kei.com!ub!clarkson!cheetah.ece.clarkson.edu!cline >From: cline@cheetah.ece.clarkson.edu (Marshall Cline) >Subject: C++ FAQ: posting #3/4 >Message-ID: <1994Feb11.210649.232@news.clarkson.edu> >Followup-To: comp.lang.c++ >Summary: Please read this before posting to comp.lang.c++ >Sender: cline@sun.soe.clarkson.edu >Nntp-Posting-Host: cheetah.ece.clarkson.edu >Reply-To: cline@parashift.com (Marshall Cline) >Organization: Paradigm Shift, Inc (training/OOD/C++/libraries) >Date: Fri, 11 Feb 1994 21:06:49 GMT >Expires: Fri, 11 Mar 1994 21:06:49 GMT >Lines: 693 comp.lang.c++ Frequently Asked Questions list (with answers, fortunately). Copyright (C) 1991-93 Marshall P. Cline, Ph.D. Posting 3 of 4. Posting #1 explains copying permissions, (no)warranty, table-of-contents, etc ============================================================================== SECTION 13: Style guidelines ============================================================================== Q77: What are some good C++ coding standards? A: Thank you for reading this answer rather than just trying to set your own coding standards. But please don't ask this question on Usenet. Nearly every software engineer has, at some point, felt that coding standards are or can be used as a `power play'. Furthermore some attempts to set C++ coding standards have been made by those unfamiliar with the language and/or paradigm, so the standards end up being based on what *was* the state-of-the-art when the setters where writing code. Such impositions generate an attitude of mistrust for coding standards. Obviously anyone who asks this question on Usenet wants to be trained so they *don't* run off on their own ignorance, but nonetheless the answers tend to generate more heat than light. ============================================================================== Q78: Are coding standards necessary? sufficient? A: Coding standards do not make non OO programmers into OO programmers. Only training and experience do that. If they have merit, it is that coding standards discourage the petty fragmentation that occurs when organizations coordinate the activities of diverse groups of programmers. But you really want more than a coding standard. The structure provided by coding standards gives neophytes one less degree of freedom to worry about, however pragmatics go well beyond pretty-printing standards. We actually need a consistent *philosophy* of implementation. Ex: strong or weak typing? references or ptrs in our interface? stream I/O or stdio? should C++ code call our C? vise versa? should we use ABCs? polymorphism? inheritance? classes? encapsulation? how should we handle exceptions? etc. Therefore what is needed is a `pseudo standard' for detailed *design*. How can we get this? I recommend a two-pronged approach: training and libraries. Training provides `intense instruction', and a high quality C++ class library provides `long term instruction'. There is a thriving commercial market for both kinds of `training'. Advice by organizations who have been through the mill is consistent: Buy, Don't Build. Buy libraries, buy training, buy tools. Companies who have attempted to become a self-taught tool-shop as well as an application/system shop have found success difficult. Few argue that coding standards are `ideal', or even `good', however many feel that they're necessary in the kind of organizations/situations described above. The following questions provide some basic guidance in conventions and styles. ============================================================================== Q79: Should our organization determine coding standards from our C experience? A: No matter how vast your C experience, no matter how advanced your C expertise, being a good C programmer does not make you a good C++ programmer. C programmers must learn to use the `++' part of `C++', or the results will be lackluster. People who want the `promise' of OOP, but who fail to put the `OO' into OOP, are fooling themselves, and the balance sheet will show their folly. C++ coding standards should be tempered by C++ experts. Asking comp.lang.c++ is a start (but don't use the term `coding standard' in the question; instead simply say, `what are the pros and cons of this technique?'). Seek out experts who can help guide you away from pitfalls. Get training. Buy libraries and see if `good' libraries pass your coding standards. Do *not* set standards by yourself unless you have considerable experience in C++. Having no standard is better than having a bad standard, since improper `official' positions `harden' bad brain traces. There is a thriving market for both C++ training and libraries from which to pool expertise. One more thing: whenever something is in demand, the potential for charlatans increases. Look before you leap. Also ask for student-reviews from past companies, since not even expertise makes someone a good communicator. Finally, select a practitioner who can teach, not a full time teacher who has a passing knowledge of the language/paradigm. ============================================================================== Q80: Should I declare locals in the middle of a fn or at the top? A: Different people have different opinions about coding standards. However one thing we all should agree on is this: no style guide should impose undue performance penalties. The real reason C++ allows objects to be created anywhere in the block is not style, but performance. An object is initialized (constructed) the moment it is declared. If you don't have enough information to initialize an object until half way down the fn, you can either initialize it to an `empty' value at the top then `assign' it later, or initialize it correctly half way down the fn. It doesn't take much imagination to see that it's cheaper to get it right the first time than it is to build it once, tear it down, then rebuild it again. Simple examples show a factor of 350% speed hit for simple classes like String. Your mileage may vary; surely the overall system degradation will be less that 300+%, but there *will* be degradation. *Unnecessary* degradation. A common retort to the above is: `we'll provide "set" methods for every datum in our objects, so the cost of construction will be spread out'. This is worse than the performance overhead, since now you're introducing a maintenance nightmare. Providing `set' methods for every datum is tantamount to public data. You've exposed your implementation technique to the world. The only thing you've hidden is the physical *names* of your subobjects, but the fact that you're using a List and a String and a float (for example) is open for all to see. Maintenance generally consumes far more resources than run-time CPU. Conclusion: in general, locals should be declared near their first use. Sorry that this isn't `familiar' to your C experts, but `new' doesn't necessarily mean `bad'. ============================================================================== Q81: What source-file-name convention is best? `foo.C'? `foo.cc'? `foo.cpp'? A: Most Un*x compilers accept `.C' for C++ source files, g++ preferring `.cc', and cfront also accepting `.c'. Most DOS and OS/2 compilers require `.cpp' since DOS filesystems aren't case sensitive. Some also advocate `.cxx'. The impact of this decision is not great, since a trivial shell script can rename all .cc files into .C files. The only files that would have to be modified are the Makefiles, which is a very small piece of your maintenance costs. Note however that some versions of cfront accept a limited set of suffixes (ie: some can't handle `.cc'; in these cases it is easier to tell `make' about CC's convention than vise versa). You can use `.C' on DOS or OS/2 if the compiler provides a command-line option to tell it to always compile with C++ rules (ex: `ztc -cpp foo.C' for Zortech, `bcc -P foo.C' for Borland, etc). ============================================================================== Q82: What header-file-name convention is best? `foo.H'? `foo.hh'? `foo.hpp'? A: The naming of your source files is cheap since it doesn't affect your source code. Your substantial investment is your source code. Therefore the names of your header files must be chosen with much greater care. The preprocessor will accept whatever name you give it in the #include line, but whatever you choose, you will want to plan on sticking with it for a long time, since it is more expensive to change (though certainly not as difficult as, say, porting to a new language). Almost all vendors ship their C++ header files using a `.h' extension, which means you can reliably do things like: #include Some sites use `.H' for their own internally developed header files, but most simply use `.h'. ============================================================================== Q83: Are there any lint-like guidelines for C++? A: Yes, there are some practices which are generally considered dangerous. However none of these are universally `bad', since situations arise when even the worst of these is needed: * a class `X's assignment operator should return `*this' as an `X&' (allows chaining of assignments) * a class with any virtual fns ought to have a virtual destructor * a class with any of {dtor, assignment-op, copy-ctor} generally needs all 3 * a class `X's copy-ctor and assignment-op should have `const' in the param: `X::X(const X&)' and `X& X::operator=(const X&)' respectively * always use initialization lists for class sub-objects rather than assignment the performance difference for user-defined classes can be substantial (3x!) * many assignment operators should start by testing if `we' are `them'; ex: X& X::operator=(const X& x) { if (this == &x) return *this; //...normal assignment duties... return *this; } sometimes there is no need to check, but these situations generally correspond to when there's no need for an explicit user-specified assignment op (as opposed to a compiler-synthesized assignment-op). * in classes that define both `+=', `+' and `=', `a+=b' and `a=a+b' should generally do the same thing; ditto for the other identities of builtin types (ex: a+=1 and ++a; p[i] and *(p+i); etc). This can be enforced by writing the binary ops using the `op=' forms; ex: X operator+(const X& a, const X& b) { X ans = a; ans += b; return ans; } This way the `constructive' binary ops don't even need to be friends. But it is sometimes possible to more efficiently implement common ops (ex: if class `X' is actually `String', and `+=' has to reallocate/copy string memory, it may be better to know the eventual length from the beginning). ============================================================================== SECTION 14: C++/Smalltalk differences and keys to learning C++ ============================================================================== Q84: Why does C++'s FAQ have a section on Smalltalk? Is this Smalltalk-bashing? A: The two `major' OOPLs in the world are C++ and Smalltalk. Due to its popularity as the OOPL with the second largest user pool, many new C++ programmers come from a Smalltalk background. This section answers the questions: * what's different about the two languages * what must a Smalltalk-turned-C++ programmer know to master C++ This section does *!*NOT*!* attempt to answer the questions: * which language is `better'? * why is Smalltalk `bad'? Nor is it an open invitation for some Smalltalk terrorist to slash my tires while I sleep (on those rare occasions when I have time to rest these days :-). ============================================================================== Q85: What's the difference between C++ and Smalltalk? A: There are many differences such as compiled vs perceived-as-interpreted, pure vs hybrid, faster vs perceived-as-slower, etc. Some of these aren't true (ex: a large portion of a typical Smalltalk program can be compiled by current implementations, and some Smalltalk implementations perform reasonably well). But none of these affect the programmer as much as the following three issues: * static typing vs dynamic typing (`strong' and `weak' are synonyms) * how you use inheritance * value vs reference semantics The first two differences are illuminated in the remainder of this section; the third point is the subject of the section that follows. If you're a Smalltalk programmer who wants to learn C++, you'd be very wise to study the next three questions carefully. Historically there have been many attempts to `make' C++ look/act like Smalltalk, even though the languages are very Very different. This hasn't always led to failures, but the differences are significant enough that it has led to a lot of needless frustration and expense. The quotable quote of the year goes to Bjarne Stroustrup at the `C++ 1995' panel discussion, 1990 C++-At-Work conference, discussing library design: `Smalltalk is the best Smalltalk around'. ============================================================================== Q86: What is `static typing', and how is it similar/dissimilar to Smalltalk? A: Static (most say `strong') typing says the compiler checks the type-safety of every operation *statically* (at compile-time), rather than to generate code which will check things at run-time. For example, the signature matching of fn arguments is checked, and an improper match is flagged as an error by the *compiler*, not at run-time. In OO code, the most common `typing mismatch' is invoking a member function against an object which isn't prepared to handle the operation. Ex: if class `X' has member fn f() but not g(), and `x' is an instance of class X, then x.f() is legal and x.g() is illegal. C++ (statically/strongly typed) catches the error at compile time, and Smalltalk (dynamically/weakly typed) catches `type' errors at run-time. (Technically speaking, C++ is like Pascal [*pseudo* statically typed], since ptr casts and unions can be used to violate the typing system; you probably shouldn't use these constructs very much). ============================================================================== Q87: Which is a better fit for C++: `static typing' or `dynamic typing'? A: The arguments over the relative goodness of static vs dynamic typing will continue forever. However one thing is clear: you should use a tool like it was intended and designed to be used. If you want to use C++ most effectively, use it as a statically typed language. C++ is flexible enough that you can (via ptr casts, unions, and #defines) make it `look' like Smalltalk. There are places where ptr casts and unions are necessary and even wholesome, but they should be used carefully and sparingly. A ptr cast tells the compiler to believe you. It effectively suspends the normal type checking facilities. An incorrect ptr cast might corrupt your heap, scribble into memory owned by other objects, call nonexistent methods, and cause general failures. It's not a pretty sight. If you avoid these and related constructs, you can make your C++ code both safer and faster -- anything that can be checked at compile time is something that doesn't have to be done at run-time, one `pro' of strong typing. Even if you're in love with weak typing, please consider using C++ as a strongly typed OOPL, or else please consider using another language that better supports your desire to defer typing decisions to run-time. Since C++ performs 100% type checking decisions at compile time, there is *no* built-in mechanism to do *any* type checking at run-time; if you use C++ as a weakly typed OOPL, you put your life in your own hands. ============================================================================== Q88: How can you tell if you have a dynamically typed C++ class library? A: One hint that a C++ class library is weakly typed is when everything is derived from a single root class, usually `Object'. Even more telling is the implementation of the container classes (List, Stack, Set, etc): if these containers are non-templates, and if their elements are inserted/extracted as ptrs to `Object', the container will promote weak typing. You can put an Apple into such a container, but when you get it out, the compiler only knows that it is derived from Object, so you have to do a pointer cast (a `down cast') to cast it `down' to an Apple (you also might hope a lot that you got it right, cause your blood is on your own head). You can make the down cast `safe' by putting a virtual fn into Object such as `are_you_an_Apple()' or perhaps `give_me_the_name_of_your_class()', but this dynamic testing is just that: dynamic. This coding style is the essence of weak typing in C++. You call a function that says `convert this Object into an Apple or kill yourself if its not an Apple', and you've got weak typing: you don't know if the call will succeed until run-time. When used with templates, the C++ compiler can statically validate 99% of an application's typing information (the figure `99%' is apocryphal; some claim they always get 100%, others find the need to do persistence which cannot be statically type checked). The point is: C++ gets genericity from templates, not from inheritance. ============================================================================== Q89: Will `standard C++' include any dynamic typing primitives? A: Yep. Note that the effect of a down-cast and a virtual fn call are similar: in the member fn that results from the virtual fn call, the `this' ptr is a downcasted version of what it used to be (it went from ptr-to-Base to ptr-to-Derived). The difference is that the virtual fn call *always* works: it never makes the wrong `down-cast' and it automatically extends itself whenever a new subclass is created -- as if an extra `case' or `if/else' magically appearing in the weak typing technique. The other difference is that the client gives control to the object rather than reasoning *about* the object. ============================================================================== Q90: How do you use inheritance in C++, and is that different from Smalltalk? A: There are two reasons one might want to use inheritance: to share code, or to express your interface compliance. Ie: given a class `B' (`B' stands for `base class', which is called `superclass' in Smalltalkese), a class `D' which is derived from B is expressed this way: class B { /*...*/ }; class D : public B { /*...*/ }; This says two distinct things: (1) the bits(data structure) + code(algorithms) are inherited from B, and (2) `D's public interface is `conformal' to `B's (anything you can do to a B, you can also do to a D, plus perhaps some other things that only D's can do; ie: a D is-a-kind-of-a B). In C++, one can use inheritance to mean: --> #2(is-a) alone (ex:you intend to override most/all inherited code) --> both #2(is-a) and #1(code-sharing) but one should never Never use the above form of inheritance to mean --> #1(code-sharing) alone (ex: D really *isn't* a B, but...) This is a major difference with Smalltalk, where there is only one form of inheritance (C++ provides `private' inheritance to mean `share the code but don't conform to the interface'). The Smalltalk language proper (as opposed to coding practice) allows you to have the *effect* of `hiding' an inherited method by providing an override that calls the `does not understand' method. Furthermore Smalltalk allows a conceptual `is-a' relationship to exist *apart* from the subclassing hierarchy (subtypes don't have to be subclasses; ex: you can make something that `is-a Stack' yet doesn't inherit from `Stack'). In contrast, C++ is more restrictive about inheritance: there's no way to make a `conceptual is-a' relationship without using inheritance (the C++ work-around is to separate interface from implementation via ABCs). The C++ compiler exploits the added semantic information associated with public inheritance to provide static typing. ============================================================================== Q91: What are the practical consequences of diffs in Smalltalk/C++ inheritance? A: Since Smalltalk lets you make a subtype without making a subclass, one can be very carefree in putting data (bits, representation, data structure) into a class (ex: you might put a linked list into a Stack class). After all, if someone wants something that an array-based-Stack, they don't have to inherit from Stack; they can go off and make effectively a stand-alone class (they might even *inherit* from an Array class, even though they're not-a-kind-of- Array!). In C++, you can't be nearly as carefree. Since only mechanism (method code), but not representation (data bits) can be overridden in subclasses, you're usually better off *not* putting the data structure in a class. This leads to the concept of Abstract Base Classes (ABCs), which are discussed in a separate question. You can change the algorithm but NOT the data structure. Bits are forever. I like to think of the difference between an ATV and a Maseratti. An ATV [all terrain vehicle] is more fun, since you can `play around' by driving through fields, streams, sidewalks and the like. A Maseratti, on the other hand, gets you there faster, but it forces you to stay on the road. My advice to C++ programmers is simple: stay on the road. Even if you're one of those people who like the `expressive freedom' to drive through the bushes, don't do it in C++; it's not a good `fit'. Note that C++ compilers uphold the is-a semantic constraint only with `public' inheritance. Neither containment (has-a), nor private or protected inheritance implies conformance. ============================================================================== Q92: Do you need to learn a `pure' OOPL before you learn C++? A: The short answer is, No. The medium answer length answer is: learning some `pure' OOPLs may *hurt* rather than help. The long answer is: read the previous questions on the difference between C++ and Smalltalk (the usual `pure' OOPL being discussed; `pure' means everything is an object of some class; `hybrid' [like C++] means things like int, char, and float are not instances of a class, hence aren't subclassable). The `purity' of the OOPL doesn't make the transition to C++ any more or less difficult; it is the weak typing and improper inheritance that is so hard to get. I've taught numerous people C++ with a Smalltalk background, and they usually have just as hard a time as those who've never seen inheritance before. In fact, my personal observation is that those with extensive experience with a weakly typed OOPL (usually but not always Smalltalk) have a *harder* time, since it's harder to *unlearn* habits than it is to learn the statically typed way from the beginning. ============================================================================== Q93: What is the NIHCL? Where can I get it? A: NIHCL stands for `national-institute-of-health's-class-library'. it can be acquired via anonymous ftp from [128.231.128.7] in the file pub/nihcl-3.0.tar.Z NIHCL (some people pronounce it `N-I-H-C-L', others pronounce it like `nickel') is a C++ translation of the Smalltalk class library. There are some ways where NIHCL's use of weak typing helps (ex: persistent objects). There are also places where the weak typing it introduces create tension with the underlying statically typed language. A draft version of the 250pp reference manual is included with version 3.10 (gnu emacs TeX-info format). It is not available via uucp, or via regular mail on tape, disk, paper, etc (at least not from Keith Gorlen). See previous questions on Smalltalk for more. ============================================================================== SECTION 15: Reference and value semantics ============================================================================== Q94: What is value and/or reference semantics, and which is best in C++? A: With reference semantics, assignment is a pointer-copy (ie: a *reference*). Value (or `copy') semantics mean assignment copies the value, not just the pointer. C++ gives you the choice: use the assignment operator to copy the value (copy/value semantics), or use a ptr-copy to copy a pointer (reference semantics). C++ allows you to override the assignment operator to do anything your heart desires, however the default (and most common) choice is to copy the *value*. Smalltalk and Eiffel and CLOS and most other OOPLs force reference semantics; you must use an alternate syntax to copy the value (clone, shallowCopy, deepCopy, etc), but even then, these languages ensure that any name of an object is actually a *pointer* to that object (Eiffel's `expanded' classes allow a supplier-side work-around). There are many pros to reference semantics, including flexibility and dynamic binding (you get dynamic binding in C++ only when you pass by ptr or pass by ref, not when you pass by value). There are also many pros to value semantics, including speed. `Speed' seems like an odd benefit to for a feature that requires an object (vs a ptr) to be copied, but the fact of the matter is that one usually accesses an object more than one copies the object, so the cost of the occasional copies is (usually) more than offset by the benefit of having an actual object rather than a ptr to an object. There are three cases when you have an actual object as opposed to a pointer to an object: local vars, global/static vars, and fully contained subobjects in a class. The most common & most important of these is the last (`containment'). More info about copy-vs-reference semantics is given in the next questions. Please read them all to get a balanced perspective. The first few have intentionally been slanted toward value semantics, so if you only read the first few of the following questions, you'll get a warped perspective. Assignment has other issues (ex: shallow vs deep copy) which are not covered here. ============================================================================== Q95: What is `virtual data', and how-can / why-would I use it in C++? A: Virtual data isn't strictly a `part' of C++, however it can be simulated. It's not entirely pretty, but it works. First we'll cover what it is and how to simulate it, then conclude with why it isn't `part' of C++. Consider classes Vec (like an array of int) and SVec (a stretchable Vec; ie: SVec overrides operator[] to automatically stretch the number of elements whenever a large index is encountered). SVec inherits from Vec. Naturally Vec's subscript operator is virtual. Now consider a VStack class (Vec-based-Stack). Naturally this Stack has a capacity limited by the fixed number of elements in the underlying Vec data structure. Then someone comes along and wants an SVStack class (SVec based Stack). For some reason, they don't want to merely modify VStack (say, because there are many users already using it). The obvious choice then would be to inherit SVStack from VStack, however then there'd be *two* Vecs in an SVStack object (one explicitly in VStack, the other as the base class subobject in the SVec which is explicitly in the SVStack). That's a lot of extra baggage. There are at least 2 solns: * break the is-a link between SVStack and VStack, text-copy the code from VStack and manually change `Vec' to `SVec'. * activate some sort of virtual data, so subclasses can change the class of subobjects. To effect virtual data, we need to change the Vec subobject from a physically contained subobject into a ptr pointing to a dynamically allocated subobject: _____original_____ |_____to_support_virtual_data_____ class VStack { | class VStack { public: | public: VStack(int cap=10) | VStack(int cap=10) : v(cap), sp(0) { } | : v(*new Vec(cap)), sp(0) { } //FREESTORE void push(int x) {v[sp++]=x;} | void push(int x) {v[sp++]=x;} //no change int pop() {return v[--sp];} | int pop() {return v[--sp];} //no change ~VStack() { } //unnecessary | ~VStack() {delete &v;} //NECESSARY protected: | protected: Vec v; //where data stored | Vec& v; //where data is stored int sp; //stack pointer | int sp; //stack pointer }; | }; Now the subclass has a shot at overriding the defn of the object referred to as `v'. Ex: basically SVStack merely needs to bind a new SVec to `v', rather than letting VStack bind the Vec. However classes can only initialize their *own* subobjects in an init-list. Even if I had used a ptr rather than a ref, VStack must be prevented from allocating its own `Vec'. The way we do this is to add another ctor to VStack that takes a Vec& and does *not* allocate a Vec: class VStack { protected: VStack(Vec& vv) : v(vv), sp(0) { } //`protected' constructor! //... //(prevents public access) }; That's all there is to it! Now the subclass (SVStack) can be defined as: class SVStack : public VStack { public: SVStack(int init_cap=10) : VStack(*new SVec(init_cap)) { } }; Pros: * implementation of SVStack is a one-liner * SVStack shares code with VStack Cons: * extra layer of indirection to access the Vec * extra freestore allocations (both new and delete) * extra dynamic binding (reason given in next question) We succeeded at making *our* job easier as implementor of SVStack, but all clients pay for it. It wouldn't be so bad if clients of SVStack paid for it, after all, they chose to use SVStack (you pay for it if you use it). However the `optimization' made the users of the plain VStack pay as well! See the question after the next to find out how much the client's `pay'. Also: *PLEASE* read the few questions that follow the next one too (YOU WILL NOT GET A BALANCED PERSPECTIVE WITHOUT THE OTHERS). ============================================================================== Q96: What's the difference between virtual data and dynamic data? A: The easiest way to see the distinction is by an analogy with `virtual fns': A virtual member fn means the declaration (signature) must stay the same in subclasses, but the defn (body) can be overridden. The overriddenness of an inherited member fn is a static property of the subclass; it doesn't change dynamically throughout the life of any particular object, nor is it possible for distinct objects of the subclass to have distinct defns of the member fn. Now go back and re-read the previous paragraph, but make these substitutions: `member fn' --> `subobject' `signature' --> `type' `body' --> `exact class' After this, you'll have a working defn of virtual data. `Per-object member fns' (a member fn `f()' which is potentially different in any given instance of an object) could be handled by burying a function ptr in the object, then setting the (const) fn ptr during construction. `Dynamic member fns' (member fns which change dynamically over time) could also be handled by function ptrs, but this time the fn ptr would not be const. In the same way, there are three distinct concepts for data members: * virtual data: the defn (`class') of the subobject is overridable in subclasses provided its declaration (`type') remains the same, and this overriddenness is a static property of the [sub]class. * per-object-data: any given object of a class can instantiate a different conformal (same type) subobject upon initialization (usually a `wrapper' object), and the exact class of the subobject is a static property of the object that wraps it. * dynamic-data: the subobject's exact class can change dynamically over time. The reason they all look so much the same is that none of this is `supported' in C++. It's all merely `allowed', and in this case, the mechanism for faking each of these is the same: a ptr to a (probably abstract) base class. In a language that made these `first class' abstraction mechanisms, the difference would be more striking, since they'd each have a different syntactic variant. ============================================================================== Q97: Should class subobjects be ptrs to freestore allocated objs, or contained? A: Usually your subobjects should actually be `contained' in the aggregate class (but not always; `wrapper' objects are a good example of where you want a a ptr/ref; also the N-to-1-uses-a relationship needs something like a ptr/ref). There are three reasons why fully contained subobjects have better performance than ptrs to freestore allocated subobjects: * extra layer to indirection every time you need to access subobject * extra freestore allocations (`new' in ctor, `delete' in dtor) * extra dynamic binding (reason given later in this question) ============================================================================== Q98: What are relative costs of the 3 performance hits of allocated subobjects? A: The three performance hits are enumerated in the previous question: * By itself, an extra layer of indirection is small potatoes. * Freestore allocations can be a big problem (standard malloc's performance degrades with more small freestore allocations; OO s/w can easily become `freestore bound' unless you're careful). * Extra dynamic binding comes from having a ptr rather than an object. Whenever the C++ compiler can know an object's *exact* class, virtual fn calls can be *statically* bound, which allows inlining. Inlining allows zillions (would you believe half a dozen :-) optimization opportunities such as procedural integration, register lifetime issues, etc. The C++ compiler can know an object's exact class in three circumstances: local variables, global/static variables, and fully-contained subobjects. Thus fully-contained subobjects allow significant optimizations that wouldn't be possible under the `subobjects-by-ptr' approach (this is the main reason that languages which enforce reference-semantics have `inherent' performance problems). ============================================================================== Q99: What is an `inline virtual member fn'? Are they ever actually `inlined'? A: A inline virtual member fn is a member fn that is inline and virtual :-). The second question is much harder to answer. The short answer is `Yes, but'. A virtual call (msg dispatch) via a ptr or ref is always resolved dynamically (at run-time). In these situations, the call is never inlined, since the actual code may be from a derived class that was created after the caller was compiled. The difference between a regular fn call and a virtual fn call is rather small. In C++, the cost of dispatching is rarely a problem. But the lack of inlining in any language can be very Very significant. Ex: simple experiments will show the difference to get as bad as an order of magnitude (for zillions of calls to insignificant member fns, loss of inlining virtual fns can result in 25X speed degradation! [Doug Lea, `Customization in C++', proc Usenix C++ 1990]). This is why endless debates over the actual number of clock cycles required to do a virtual call in language/compiler X on machine Y are largely meaningless. Ie: many language implementation vendors make a big stink about how good their msg dispatch strategy is, but if these implementations don't *inline* method calls, the overall system performance would be poor, since it is inlining --*not* dispatching-- that has the greatest performance impact. NOTE: PLEASE READ THE NEXT TWO QUESTIONS TO SEE THE OTHER SIDE OF THIS COIN! ============================================================================== Q100: Sounds like I should never use reference semantics, right? A: Wrong. Reference semantics is A Good Thing. We can't live without pointers. We just don't want our s/w to be One Gigantic Pointer. In C++, you can pick and choose where you want reference semantics (ptrs/refs) and where you'd like value semantics (where objects physically contain other objects etc). In a large system, there should be a balance. However if you implement absolutely *everything* as a pointer, you'll get enormous speed hits. Objects near the problem skin are larger than higher level objects. The *identity* of these `problem space' abstractions is usually more important than their `value'. These combine to indicate reference semantics should be used for problem-space objects (Booch says `Entity Abstractions'; see on `Books'). The question arises: is reference semantics likely to cause a performance problem in these `entity abstractions'? The key insight in answering this question is that the relative interaction frequency is much lower for problem skin abstractions than for low level server objects. Thus we have an *ideal* situation in C++: we can choose reference semantics for objects that need unique identity or that are too large to copy, and we can choose value semantics for the others. The result is very likely to be that the highest frequency objects will end up with value semantics. Thus we install flexibility only where it doesn't hurt us, and performance where we need it most! These are some of the many issues the come into play with real OO design. OO/C++ mastery takes time and high quality training. That's the investment-price you pay for a powerful tool. <<<>>> ============================================================================== Q101: Does the poor performance of ref semantics mean I should pass-by-value? A: No. In fact, `NO!' :-) The previous questions were talking about *subobjects*, not parameters. Pass- by-value is usually a bad idea when mixed with inheritance (larger subclass objects get `sliced' when passed by value as a base class object). Generally, objects that are part of an inheritance hierarchy should be passed by ref or by ptr, but not by value, since only then do you get the (desired) dynamic binding. Unless compelling reasons are given to the contrary, subobjects should be by value and parameters should be by reference. The discussion in the previous few questions indicates some of the `compelling reasons' for when subobjects should be by reference. -- Marshall Cline -- Marshall P. Cline, Ph.D. / Paradigm Shift Inc / PO Box 5108 / Potsdam NY 13676 cline@parashift.com / 315-353-6100 / FAX: 315-353-6110