From rifkin@cco.caltech.edu Sun Feb 13 05:15:41 1994
To: adam@vlsi.cs.caltech.edu
Subject: C++ Faq 3


>Newsgroups: comp.lang.c++
>Path: nntp-server.caltech.edu!netline-fddi.jpl.nasa.gov!elroy.jpl.nasa.gov!swrinde!cs.utexas.edu!howland.reston.ans.net!europa.eng.gtefsd.com!news.umbc.edu!eff!news.kei.com!ub!clarkson!cheetah.ece.clarkson.edu!cline
>From: cline@cheetah.ece.clarkson.edu (Marshall Cline)
>Subject: C++ FAQ: posting #3/4
>Message-ID: <1994Feb11.210649.232@news.clarkson.edu>
>Followup-To: comp.lang.c++
>Summary: Please read this before posting to comp.lang.c++
>Sender: cline@sun.soe.clarkson.edu
>Nntp-Posting-Host: cheetah.ece.clarkson.edu
>Reply-To: cline@parashift.com (Marshall Cline)
>Organization: Paradigm Shift, Inc (training/OOD/C++/libraries)
>Date: Fri, 11 Feb 1994 21:06:49 GMT
>Expires: Fri, 11 Mar 1994 21:06:49 GMT
>Lines: 693

comp.lang.c++ Frequently Asked Questions list (with answers, fortunately).
Copyright (C) 1991-93 Marshall P. Cline, Ph.D.
Posting 3 of 4.
Posting #1 explains copying permissions, (no)warranty, table-of-contents, etc

==============================================================================
SECTION 13: Style guidelines
==============================================================================

Q77: What are some good C++ coding standards?
A: Thank you for reading this answer rather than just trying to set your own
coding standards.  But please don't ask this question on Usenet.  Nearly every
software engineer has, at some point, felt that coding standards are or can be
used as a `power play'.  Furthermore some attempts to set C++ coding standards
have been made by those unfamiliar with the language and/or paradigm, so the
standards end up being based on what *was* the state-of-the-art when the
setters where writing code.  Such impositions generate an attitude of mistrust
for coding standards.  Obviously anyone who asks this question on Usenet wants
to be trained so they *don't* run off on their own ignorance, but nonetheless
the answers tend to generate more heat than light.

==============================================================================

Q78: Are coding standards necessary?  sufficient?
A: Coding standards do not make non OO programmers into OO programmers.  Only
training and experience do that.  If they have merit, it is that coding
standards discourage the petty fragmentation that occurs when organizations
coordinate the activities of diverse groups of programmers.

But you really want more than a coding standard.  The structure provided by
coding standards gives neophytes one less degree of freedom to worry about,
however pragmatics go well beyond pretty-printing standards.  We actually need
a consistent *philosophy* of implementation.  Ex: strong or weak typing?
references or ptrs in our interface?  stream I/O or stdio?  should C++ code
call our C?  vise versa?  should we use ABCs?  polymorphism?  inheritance?
classes? encapsulation?  how should we handle exceptions?  etc.

Therefore what is needed is a `pseudo standard' for detailed *design*.  How can
we get this?  I recommend a two-pronged approach: training and libraries.
Training provides `intense instruction', and a high quality C++ class library
provides `long term instruction'.  There is a thriving commercial market for
both kinds of `training'.  Advice by organizations who have been through the
mill is consistent: Buy, Don't Build.  Buy libraries, buy training, buy tools.
Companies who have attempted to become a self-taught tool-shop as well as an
application/system shop have found success difficult.

Few argue that coding standards are `ideal', or even `good', however many feel
that they're necessary in the kind of organizations/situations described above.

The following questions provide some basic guidance in conventions and styles.

==============================================================================

Q79: Should our organization determine coding standards from our C experience?
A: No matter how vast your C experience, no matter how advanced your C
expertise, being a good C programmer does not make you a good C++ programmer.
C programmers must learn to use the `++' part of `C++', or the results will be
lackluster.  People who want the `promise' of OOP, but who fail to put the `OO'
into OOP, are fooling themselves, and the balance sheet will show their folly.

C++ coding standards should be tempered by C++ experts.  Asking comp.lang.c++
is a start (but don't use the term `coding standard' in the question; instead
simply say, `what are the pros and cons of this technique?').  Seek out experts
who can help guide you away from pitfalls.  Get training.  Buy libraries and
see if `good' libraries pass your coding standards.  Do *not* set standards by
yourself unless you have considerable experience in C++.  Having no standard is
better than having a bad standard, since improper `official' positions `harden'
bad brain traces.  There is a thriving market for both C++ training and
libraries from which to pool expertise.

One more thing: whenever something is in demand, the potential for charlatans
increases.  Look before you leap.  Also ask for student-reviews from past
companies, since not even expertise makes someone a good communicator.
Finally, select a practitioner who can teach, not a full time teacher who has a
passing knowledge of the language/paradigm.

==============================================================================

Q80: Should I declare locals in the middle of a fn or at the top?
A: Different people have different opinions about coding standards.  However
one thing we all should agree on is this: no style guide should impose undue
performance penalties.  The real reason C++ allows objects to be created
anywhere in the block is not style, but performance.

An object is initialized (constructed) the moment it is declared.  If you don't
have enough information to initialize an object until half way down the fn, you
can either initialize it to an `empty' value at the top then `assign' it later,
or initialize it correctly half way down the fn.  It doesn't take much
imagination to see that it's cheaper to get it right the first time than it is
to build it once, tear it down, then rebuild it again.  Simple examples show a
factor of 350% speed hit for simple classes like String.  Your mileage may
vary; surely the overall system degradation will be less that 300+%, but there
*will* be degradation.  *Unnecessary* degradation.

A common retort to the above is: `we'll provide "set" methods for every datum
in our objects, so the cost of construction will be spread out'.  This is worse
than the performance overhead, since now you're introducing a maintenance
nightmare.  Providing `set' methods for every datum is tantamount to public
data.  You've exposed your implementation technique to the world.  The only
thing you've hidden is the physical *names* of your subobjects, but the fact
that you're using a List and a String and a float (for example) is open for all
to see.  Maintenance generally consumes far more resources than run-time CPU.

Conclusion: in general, locals should be declared near their first use.  Sorry
that this isn't `familiar' to your C experts, but `new' doesn't necessarily
mean `bad'.

==============================================================================

Q81: What source-file-name convention is best? `foo.C'? `foo.cc'? `foo.cpp'?
A: Most Un*x compilers accept `.C' for C++ source files, g++ preferring `.cc',
and cfront also accepting `.c'.  Most DOS and OS/2 compilers require `.cpp'
since DOS filesystems aren't case sensitive.  Some also advocate `.cxx'.  The
impact of this decision is not great, since a trivial shell script can rename
all .cc files into .C files.  The only files that would have to be modified are
the Makefiles, which is a very small piece of your maintenance costs.  Note
however that some versions of cfront accept a limited set of suffixes (ie: some
can't handle `.cc'; in these cases it is easier to tell `make' about CC's
convention than vise versa).

You can use `.C' on DOS or OS/2 if the compiler provides a command-line option
to tell it to always compile with C++ rules (ex: `ztc -cpp foo.C' for Zortech,
`bcc -P foo.C' for Borland, etc).

==============================================================================

Q82: What header-file-name convention is best? `foo.H'? `foo.hh'? `foo.hpp'?
A: The naming of your source files is cheap since it doesn't affect your source
code.  Your substantial investment is your source code.  Therefore the names of
your header files must be chosen with much greater care.  The preprocessor will
accept whatever name you give it in the #include line, but whatever you choose,
you will want to plan on sticking with it for a long time, since it is more
expensive to change (though certainly not as difficult as, say, porting to a
new language).

Almost all vendors ship their C++ header files using a `.h' extension, which
means you can reliably do things like:
		#include <iostream.h>

Some sites use `.H' for their own internally developed header files, but most
simply use `.h'.

==============================================================================

Q83: Are there any lint-like guidelines for C++?
A: Yes, there are some practices which are generally considered dangerous.
However none of these are universally `bad', since situations arise when
even the worst of these is needed:
 * a class `X's assignment operator should return `*this' as an `X&'
   (allows chaining of assignments)
 * a class with any virtual fns ought to have a virtual destructor
 * a class with any of {dtor, assignment-op, copy-ctor} generally needs all 3
 * a class `X's copy-ctor and assignment-op should have `const' in the param:
   `X::X(const X&)'  and  `X& X::operator=(const X&)'  respectively
 * always use initialization lists for class sub-objects rather than assignment
   the performance difference for user-defined classes can be substantial (3x!)
 * many assignment operators should start by testing if `we' are `them'; ex:
	X& X::operator=(const X& x)
	{
	  if (this == &x) return *this;
	  //...normal assignment duties...
	  return *this;
	}
   sometimes there is no need to check, but these situations generally
   correspond to when there's no need for an explicit user-specified assignment
   op (as opposed to a compiler-synthesized assignment-op).
 * in classes that define both `+=', `+' and `=', `a+=b' and `a=a+b' should
   generally do the same thing; ditto for the other identities of builtin types
   (ex: a+=1 and ++a; p[i] and *(p+i); etc).  This can be enforced by writing
   the binary ops using the `op=' forms; ex:
	X operator+(const X& a, const X& b)
	{
	  X ans = a;
	  ans += b;
	  return ans;
	}
   This way the `constructive' binary ops don't even need to be friends.  But
   it is sometimes possible to more efficiently implement common ops (ex: if
   class `X' is actually `String', and `+=' has to reallocate/copy string
   memory, it may be better to know the eventual length from the beginning).

==============================================================================
SECTION 14: C++/Smalltalk differences and keys to learning C++
==============================================================================

Q84: Why does C++'s FAQ have a section on Smalltalk? Is this Smalltalk-bashing?
A: The two `major' OOPLs in the world are C++ and Smalltalk.  Due to its
popularity as the OOPL with the second largest user pool, many new C++
programmers come from a Smalltalk background.  This section answers the
questions:
 * what's different about the two languages
 * what must a Smalltalk-turned-C++ programmer know to master C++

This section does *!*NOT*!* attempt to answer the questions:
 * which language is `better'?
 * why is Smalltalk `bad'?

Nor is it an open invitation for some Smalltalk terrorist to slash my tires
while I sleep (on those rare occasions when I have time to rest these days :-).

==============================================================================

Q85: What's the difference between C++ and Smalltalk?
A: There are many differences such as compiled vs perceived-as-interpreted,
pure vs hybrid, faster vs perceived-as-slower, etc.  Some of these aren't true
(ex: a large portion of a typical Smalltalk program can be compiled by current
implementations, and some Smalltalk implementations perform reasonably well).
But none of these affect the programmer as much as the following three issues:

	* static typing vs dynamic typing (`strong' and `weak' are synonyms)
	* how you use inheritance
	* value vs reference semantics

The first two differences are illuminated in the remainder of this section; the
third point is the subject of the section that follows.

If you're a Smalltalk programmer who wants to learn C++, you'd be very wise to
study the next three questions carefully.  Historically there have been many
attempts to `make' C++ look/act like Smalltalk, even though the languages are
very Very different.  This hasn't always led to failures, but the differences
are significant enough that it has led to a lot of needless frustration and
expense.  The quotable quote of the year goes to Bjarne Stroustrup at the `C++
1995' panel discussion, 1990 C++-At-Work conference, discussing library design:
		`Smalltalk is the best Smalltalk around'.

==============================================================================

Q86: What is `static typing', and how is it similar/dissimilar to Smalltalk?
A: Static (most say `strong') typing says the compiler checks the type-safety
of every operation *statically* (at compile-time), rather than to generate code
which will check things at run-time.  For example, the signature matching of fn
arguments is checked, and an improper match is flagged as an error by the
*compiler*, not at run-time.

In OO code, the most common `typing mismatch' is invoking a member function
against an object which isn't prepared to handle the operation.  Ex: if class
`X' has member fn f() but not g(), and `x' is an instance of class X, then
x.f() is legal and x.g() is illegal.  C++ (statically/strongly typed) catches
the error at compile time, and Smalltalk (dynamically/weakly typed) catches
`type' errors at run-time.  (Technically speaking, C++ is like Pascal
[*pseudo* statically typed], since ptr casts and unions can be used to violate
the typing system; you probably shouldn't use these constructs very much).

==============================================================================

Q87: Which is a better fit for C++: `static typing' or `dynamic typing'?
A: The arguments over the relative goodness of static vs dynamic typing will
continue forever.  However one thing is clear: you should use a tool like it
was intended and designed to be used.  If you want to use C++ most effectively,
use it as a statically typed language.  C++ is flexible enough that you can
(via ptr casts, unions, and #defines) make it `look' like Smalltalk.

There are places where ptr casts and unions are necessary and even wholesome,
but they should be used carefully and sparingly.  A ptr cast tells the compiler
to believe you.  It effectively suspends the normal type checking facilities.
An incorrect ptr cast might corrupt your heap, scribble into memory owned by
other objects, call nonexistent methods, and cause general failures.  It's not
a pretty sight.  If you avoid these and related constructs, you can make your
C++ code both safer and faster -- anything that can be checked at compile time
is something that doesn't have to be done at run-time, one `pro' of strong
typing.

Even if you're in love with weak typing, please consider using C++ as a
strongly typed OOPL, or else please consider using another language that better
supports your desire to defer typing decisions to run-time.  Since C++ performs
100% type checking decisions at compile time, there is *no* built-in mechanism
to do *any* type checking at run-time; if you use C++ as a weakly typed OOPL,
you put your life in your own hands.

==============================================================================

Q88: How can you tell if you have a dynamically typed C++ class library?
A: One hint that a C++ class library is weakly typed is when everything is
derived from a single root class, usually `Object'.  Even more telling is the
implementation of the container classes (List, Stack, Set, etc): if these
containers are non-templates, and if their elements are inserted/extracted as
ptrs to `Object', the container will promote weak typing.  You can put an Apple
into such a container, but when you get it out, the compiler only knows that it
is derived from Object, so you have to do a pointer cast (a `down cast') to
cast it `down' to an Apple (you also might hope a lot that you got it right,
cause your blood is on your own head).

You can make the down cast `safe' by putting a virtual fn into Object such as
`are_you_an_Apple()' or perhaps `give_me_the_name_of_your_class()', but this
dynamic testing is just that: dynamic.  This coding style is the essence of
weak typing in C++.  You call a function that says `convert this Object into an
Apple or kill yourself if its not an Apple', and you've got weak typing: you
don't know if the call will succeed until run-time.

When used with templates, the C++ compiler can statically validate 99% of an
application's typing information (the figure `99%' is apocryphal; some claim
they always get 100%, others find the need to do persistence which cannot be
statically type checked).  The point is: C++ gets genericity from templates,
not from inheritance.

==============================================================================

Q89: Will `standard C++' include any dynamic typing primitives?
A: Yep.

Note that the effect of a down-cast and a virtual fn call are similar: in the
member fn that results from the virtual fn call, the `this' ptr is a downcasted
version of what it used to be (it went from ptr-to-Base to ptr-to-Derived).
The difference is that the virtual fn call *always* works: it never makes the
wrong `down-cast' and it automatically extends itself whenever a new subclass
is created -- as if an extra `case' or `if/else' magically appearing in the
weak typing technique.  The other difference is that the client gives control
to the object rather than reasoning *about* the object.

==============================================================================

Q90: How do you use inheritance in C++, and is that different from Smalltalk?
A: There are two reasons one might want to use inheritance: to share code, or
to express your interface compliance.  Ie: given a class `B' (`B' stands for
`base class', which is called `superclass' in Smalltalkese), a class `D' which
is derived from B is expressed this way:

	class B { /*...*/ };
	class D : public B { /*...*/ };

This says two distinct things: (1) the bits(data structure) + code(algorithms)
are inherited from B, and (2) `D's public interface is `conformal' to `B's
(anything you can do to a B, you can also do to a D, plus perhaps some other
things that only D's can do; ie: a D is-a-kind-of-a B).

In C++, one can use inheritance to mean:
	--> #2(is-a) alone (ex:you intend to override most/all inherited code)
	--> both #2(is-a) and #1(code-sharing)
but one should never Never use the above form of inheritance to mean
	--> #1(code-sharing) alone (ex: D really *isn't* a B, but...)

This is a major difference with Smalltalk, where there is only one form of
inheritance (C++ provides `private' inheritance to mean `share the code but
don't conform to the interface').  The Smalltalk language proper (as opposed to
coding practice) allows you to have the *effect* of `hiding' an inherited
method by providing an override that calls the `does not understand' method.
Furthermore Smalltalk allows a conceptual `is-a' relationship to exist *apart*
from the subclassing hierarchy (subtypes don't have to be subclasses; ex: you
can make something that `is-a Stack' yet doesn't inherit from `Stack').

In contrast, C++ is more restrictive about inheritance: there's no way to make
a `conceptual is-a' relationship without using inheritance (the C++ work-around
is to separate interface from implementation via ABCs).  The C++ compiler
exploits the added semantic information associated with public inheritance to
provide static typing.

==============================================================================

Q91: What are the practical consequences of diffs in Smalltalk/C++ inheritance?
A: Since Smalltalk lets you make a subtype without making a subclass, one can
be very carefree in putting data (bits, representation, data structure) into a
class (ex: you might put a linked list into a Stack class).  After all, if
someone wants something that an array-based-Stack, they don't have to inherit
from Stack; they can go off and make effectively a stand-alone class (they
might even *inherit* from an Array class, even though they're not-a-kind-of-
Array!).

In C++, you can't be nearly as carefree.  Since only mechanism (method code),
but not representation (data bits) can be overridden in subclasses, you're
usually better off *not* putting the data structure in a class.  This leads to
the concept of Abstract Base Classes (ABCs), which are discussed in a separate
question.  You can change the algorithm but NOT the data structure.  Bits are
forever.

I like to think of the difference between an ATV and a Maseratti.  An ATV [all
terrain vehicle] is more fun, since you can `play around' by driving through
fields, streams, sidewalks and the like.  A Maseratti, on the other hand, gets
you there faster, but it forces you to stay on the road.  My advice to C++
programmers is simple: stay on the road.  Even if you're one of those people
who like the `expressive freedom' to drive through the bushes, don't do it in
C++; it's not a good `fit'.

Note that C++ compilers uphold the is-a semantic constraint only with `public'
inheritance.  Neither containment (has-a), nor private or protected inheritance
implies conformance.

==============================================================================

Q92: Do you need to learn a `pure' OOPL before you learn C++?
A: The short answer is, No.

The medium answer length answer is: learning some `pure' OOPLs may *hurt*
rather than help.

The long answer is: read the previous questions on the difference between C++
and Smalltalk (the usual `pure' OOPL being discussed; `pure' means everything
is an object of some class; `hybrid' [like C++] means things like int, char,
and float are not instances of a class, hence aren't subclassable).

The `purity' of the OOPL doesn't make the transition to C++ any more or less
difficult; it is the weak typing and improper inheritance that is so hard to
get.  I've taught numerous people C++ with a Smalltalk background, and they
usually have just as hard a time as those who've never seen inheritance before.
In fact, my personal observation is that those with extensive experience with a
weakly typed OOPL (usually but not always Smalltalk) have a *harder* time,
since it's harder to *unlearn* habits than it is to learn the statically typed
way from the beginning.

==============================================================================

Q93: What is the NIHCL?  Where can I get it?
A: NIHCL stands for `national-institute-of-health's-class-library'.
it can be acquired via anonymous ftp from [128.231.128.7]
in the file pub/nihcl-3.0.tar.Z

NIHCL (some people pronounce it `N-I-H-C-L', others pronounce it like `nickel')
is a C++ translation of the Smalltalk class library.  There are some ways where
NIHCL's use of weak typing helps (ex: persistent objects).  There are also
places where the weak typing it introduces create tension with the underlying
statically typed language.

A draft version of the 250pp reference manual is included with version 3.10
(gnu emacs TeX-info format).  It is not available via uucp, or via regular mail
on tape, disk, paper, etc (at least not from Keith Gorlen).

See previous questions on Smalltalk for more.

==============================================================================
SECTION 15: Reference and value semantics
==============================================================================

Q94: What is value and/or reference semantics, and which is best in C++?
A: With reference semantics, assignment is a pointer-copy (ie: a *reference*).
Value (or `copy') semantics mean assignment copies the value, not just the
pointer.  C++ gives you the choice: use the assignment operator to copy the
value (copy/value semantics), or use a ptr-copy to copy a pointer (reference
semantics).  C++ allows you to override the assignment operator to do anything
your heart desires, however the default (and most common) choice is to copy the
*value*.  Smalltalk and Eiffel and CLOS and most other OOPLs force reference
semantics; you must use an alternate syntax to copy the value (clone,
shallowCopy, deepCopy, etc), but even then, these languages ensure that any
name of an object is actually a *pointer* to that object (Eiffel's `expanded'
classes allow a supplier-side work-around).

There are many pros to reference semantics, including flexibility and dynamic
binding (you get dynamic binding in C++ only when you pass by ptr or pass by
ref, not when you pass by value).

There are also many pros to value semantics, including speed.  `Speed' seems
like an odd benefit to for a feature that requires an object (vs a ptr) to be
copied, but the fact of the matter is that one usually accesses an object more
than one copies the object, so the cost of the occasional copies is (usually)
more than offset by the benefit of having an actual object rather than a ptr to
an object.

There are three cases when you have an actual object as opposed to a pointer to
an object: local vars, global/static vars, and fully contained subobjects in a
class.  The most common & most important of these is the last (`containment').

More info about copy-vs-reference semantics is given in the next questions.
Please read them all to get a balanced perspective.  The first few have
intentionally been slanted toward value semantics, so if you only read the
first few of the following questions, you'll get a warped perspective.

Assignment has other issues (ex: shallow vs deep copy) which are not covered
here.

==============================================================================

Q95: What is `virtual data', and how-can / why-would I use it in C++?
A: Virtual data isn't strictly a `part' of C++, however it can be simulated.
It's not entirely pretty, but it works.  First we'll cover what it is and how
to simulate it, then conclude with why it isn't `part' of C++.

Consider classes Vec (like an array of int) and SVec (a stretchable Vec; ie:
SVec overrides operator[] to automatically stretch the number of elements
whenever a large index is encountered).  SVec inherits from Vec.  Naturally
Vec's subscript operator is virtual.

Now consider a VStack class (Vec-based-Stack).  Naturally this Stack has a
capacity limited by the fixed number of elements in the underlying Vec data
structure.  Then someone comes along and wants an SVStack class (SVec based
Stack).  For some reason, they don't want to merely modify VStack (say, because
there are many users already using it).

The obvious choice then would be to inherit SVStack from VStack, however
then there'd be *two* Vecs in an SVStack object (one explicitly in VStack,
the other as the base class subobject in the SVec which is explicitly in
the SVStack).  That's a lot of extra baggage.  There are at least 2 solns:
 * break the is-a link between SVStack and VStack, text-copy the code from
   VStack and manually change `Vec' to `SVec'.
 * activate some sort of virtual data, so subclasses can change the
   class of subobjects.

To effect virtual data, we need to change the Vec subobject from a physically
contained subobject into a ptr pointing to a dynamically allocated subobject:

_____original_____		|_____to_support_virtual_data_____
class VStack {			| class VStack {
public:				| public:
  VStack(int cap=10)		|   VStack(int cap=10)
    : v(cap), sp(0) { }		|     : v(*new Vec(cap)), sp(0) { } //FREESTORE
  void push(int x) {v[sp++]=x;}	|   void push(int x) {v[sp++]=x;}   //no change
  int  pop()  {return v[--sp];}	|   int  pop()  {return v[--sp];}   //no change
 ~VStack() { }   //unnecessary	|  ~VStack()    {delete &v;}        //NECESSARY
protected:			| protected:
  Vec v;  //where data stored	|   Vec& v; //where data is stored
  int sp; //stack pointer	|   int sp; //stack pointer
};				| };

Now the subclass has a shot at overriding the defn of the object referred to as
`v'.  Ex: basically SVStack merely needs to bind a new SVec to `v', rather than
letting VStack bind the Vec.  However classes can only initialize their *own*
subobjects in an init-list.  Even if I had used a ptr rather than a ref, VStack
must be prevented from allocating its own `Vec'.  The way we do this is to add
another ctor to VStack that takes a Vec& and does *not* allocate a Vec:

	class VStack {
	protected:
	  VStack(Vec& vv) : v(vv), sp(0) { }	//`protected' constructor!
	//...					//(prevents public access)
	};

That's all there is to it!  Now the subclass (SVStack) can be defined as:

	class SVStack : public VStack {
	public:
	  SVStack(int init_cap=10) : VStack(*new SVec(init_cap)) { }
	};

Pros:	* implementation of SVStack is a one-liner
	* SVStack shares code with VStack

Cons:	* extra layer of indirection to access the Vec
	* extra freestore allocations (both new and delete)
	* extra dynamic binding (reason given in next question)

We succeeded at making *our* job easier as implementor of SVStack, but all
clients pay for it.  It wouldn't be so bad if clients of SVStack paid for it,
after all, they chose to use SVStack (you pay for it if you use it).  However
the `optimization' made the users of the plain VStack pay as well!

See the question after the next to find out how much the client's `pay'.  Also:
*PLEASE* read the few questions that follow the next one too (YOU WILL NOT GET
A BALANCED PERSPECTIVE WITHOUT THE OTHERS).

==============================================================================

Q96: What's the difference between virtual data and dynamic data?
A: The easiest way to see the distinction is by an analogy with `virtual fns':
A virtual member fn means the declaration (signature) must stay the same in
subclasses, but the defn (body) can be overridden.  The overriddenness of an
inherited member fn is a static property of the subclass; it doesn't change
dynamically throughout the life of any particular object, nor is it possible
for distinct objects of the subclass to have distinct defns of the member fn.

Now go back and re-read the previous paragraph, but make these substitutions:
	`member fn' --> `subobject'
	`signature' --> `type'
	`body'      --> `exact class'
After this, you'll have a working defn of virtual data.

`Per-object member fns' (a member fn `f()' which is potentially different in
any given instance of an object) could be handled by burying a function ptr in
the object, then setting the (const) fn ptr during construction.

`Dynamic member fns' (member fns which change dynamically over time) could also
be handled by function ptrs, but this time the fn ptr would not be const.

In the same way, there are three distinct concepts for data members:
 * virtual data: the defn (`class') of the subobject is overridable in
   subclasses provided its declaration (`type') remains the same, and this
   overriddenness is a static property of the [sub]class.
 * per-object-data: any given object of a class can instantiate a different
   conformal (same type) subobject upon initialization (usually a `wrapper'
   object), and the exact class of the subobject is a static property of the
   object that wraps it.
 * dynamic-data: the subobject's exact class can change dynamically over time.

The reason they all look so much the same is that none of this is `supported'
in C++.  It's all merely `allowed', and in this case, the mechanism for faking
each of these is the same: a ptr to a (probably abstract) base class.  In a
language that made these `first class' abstraction mechanisms, the difference
would be more striking, since they'd each have a different syntactic variant.

==============================================================================

Q97: Should class subobjects be ptrs to freestore allocated objs, or contained?
A: Usually your subobjects should actually be `contained' in the aggregate
class (but not always; `wrapper' objects are a good example of where you want a
a ptr/ref; also the N-to-1-uses-a relationship needs something like a ptr/ref).

There are three reasons why fully contained subobjects have better performance
than ptrs to freestore allocated subobjects:
	* extra layer to indirection every time you need to access subobject
	* extra freestore allocations (`new' in ctor, `delete' in dtor)
	* extra dynamic binding (reason given later in this question)

==============================================================================

Q98: What are relative costs of the 3 performance hits of allocated subobjects?
A: The three performance hits are enumerated in the previous question:
 * By itself, an extra layer of indirection is small potatoes.
 * Freestore allocations can be a big problem (standard malloc's performance
   degrades with more small freestore allocations; OO s/w can easily become
   `freestore bound' unless you're careful).
 * Extra dynamic binding comes from having a ptr rather than an object.
   Whenever the C++ compiler can know an object's *exact* class, virtual fn
   calls can be *statically* bound, which allows inlining.  Inlining allows
   zillions (would you believe half a dozen :-) optimization opportunities
   such as procedural integration, register lifetime issues, etc.  The C++
   compiler can know an object's exact class in three circumstances: local
   variables, global/static variables, and fully-contained subobjects.

Thus fully-contained subobjects allow significant optimizations that wouldn't
be possible under the `subobjects-by-ptr' approach (this is the main reason
that languages which enforce reference-semantics have `inherent' performance
problems).

==============================================================================

Q99: What is an `inline virtual member fn'?  Are they ever actually `inlined'?
A: A inline virtual member fn is a member fn that is inline and virtual :-).
The second question is much harder to answer.  The short answer is `Yes, but'.

A virtual call (msg dispatch) via a ptr or ref is always resolved dynamically
(at run-time).  In these situations, the call is never inlined, since the
actual code may be from a derived class that was created after the caller was
compiled.

The difference between a regular fn call and a virtual fn call is rather small.
In C++, the cost of dispatching is rarely a problem.  But the lack of inlining
in any language can be very Very significant.  Ex: simple experiments will show
the difference to get as bad as an order of magnitude (for zillions of calls to
insignificant member fns, loss of inlining virtual fns can result in 25X speed
degradation! [Doug Lea, `Customization in C++', proc Usenix C++ 1990]).

This is why endless debates over the actual number of clock cycles required to
do a virtual call in language/compiler X on machine Y are largely meaningless.
Ie: many language implementation vendors make a big stink about how good their
msg dispatch strategy is, but if these implementations don't *inline* method
calls, the overall system performance would be poor, since it is inlining
--*not* dispatching-- that has the greatest performance impact.

NOTE: PLEASE READ THE NEXT TWO QUESTIONS TO SEE THE OTHER SIDE OF THIS COIN!

==============================================================================

Q100: Sounds like I should never use reference semantics, right?
A: Wrong.

Reference semantics is A Good Thing.  We can't live without pointers.  We just
don't want our s/w to be One Gigantic Pointer.  In C++, you can pick and choose
where you want reference semantics (ptrs/refs) and where you'd like value
semantics (where objects physically contain other objects etc).  In a large
system, there should be a balance.  However if you implement absolutely
*everything* as a pointer, you'll get enormous speed hits.

Objects near the problem skin are larger than higher level objects.  The
*identity* of these `problem space' abstractions is usually more important than
their `value'.  These combine to indicate reference semantics should be used
for problem-space objects (Booch says `Entity Abstractions'; see on `Books').

The question arises: is reference semantics likely to cause a performance
problem in these `entity abstractions'?  The key insight in answering this
question is that the relative interaction frequency is much lower for problem
skin abstractions than for low level server objects.

Thus we have an *ideal* situation in C++: we can choose reference semantics for
objects that need unique identity or that are too large to copy, and we can
choose value semantics for the others.  The result is very likely to be that
the highest frequency objects will end up with value semantics.  Thus we
install flexibility only where it doesn't hurt us, and performance where we
need it most!

These are some of the many issues the come into play with real OO design.
OO/C++ mastery takes time and high quality training.
That's the investment-price you pay for a powerful tool.

	<<<<DON'T STOP NOW!  READ THE NEXT QUESTION TOO!!>>>>

==============================================================================

Q101: Does the poor performance of ref semantics mean I should pass-by-value?
A: No.  In fact, `NO!' :-)

The previous questions were talking about *subobjects*, not parameters.  Pass-
by-value is usually a bad idea when mixed with inheritance (larger subclass
objects get `sliced' when passed by value as a base class object).  Generally,
objects that are part of an inheritance hierarchy should be passed by ref or by
ptr, but not by value, since only then do you get the (desired) dynamic
binding.

Unless compelling reasons are given to the contrary, subobjects should be by
value and parameters should be by reference.  The discussion in the previous
few questions indicates some of the `compelling reasons' for when subobjects
should be by reference.

--
Marshall Cline
--
Marshall P. Cline, Ph.D. / Paradigm Shift Inc / PO Box 5108 / Potsdam NY 13676
cline@parashift.com / 315-353-6100 / FAX: 315-353-6110