On programming

While we’re digressing (will try to limit the extent… ps. I failed)…

As for programming, the (almost) lost art of Assembler is frowned upon as no longer relevant from many quarters.

More money for me! You cannot truly understand how a computer operates without having that assembly language->machine code->bare metal thing click in your mind. Only people who have written at least SOME assembly, and have seen a CPU diagram or two… understand what a CPU is, and hence “how” a computer works. I agree this is sadly becoming a lost piece of understanding.

When writing fast, robust, numerically intensive solution engines for scientific or engineering applications, one can of course do this in a high-level language but the programmer has a significant advantage if s/he knows what the compiled code looks like at the CPU’s level. Compilers often blindly add bits of library code that the programmer may not even be aware of and that are unnecessary for the code in question.

Indeed but as I think you imply, I’d still write it in good C++. The difference these days between that and raw assembly are negligible in all but the most extreme cases. Your I/O operations cost a lot more than CPU cycles, which can almost be seen as irrelevant. If you can do cache or memory optimisation… then yes, but from what I’ve read mere mortals are below understanding a modern cache system well enough to make it better with userland code, and usually memory access will be governed by a (relatively) expensive call into your OS anyway, which may just decide to swap your highly-hand-optimised memory access to hard-disk, unlucky. In 99.9999999% of cases, and even some cases people think they can do better, the compiler will be better. That part of the “forget assembly” argument I buy.

But then a (good) C/C++ compiler for a PIC microchip costs a buttload of money, out of the range of a hobbyist. So there I write assembly. But even on a 20Mhz PIC your code gets executed so freaking fast (no OS, no task-switching, no nothing, your code line by line at 20mhz, is actually pretty amazing) I’ve not yet found a situation where a PIC isn’t sitting idle most of the time. - Hence they tend to build all kinds of idle-switching, power save modes, etc… into them, even if your code has to run once every 10ms the chip can still go to sleep, save some power, and wake up again in time to do it’s job.

Also, CPU-specific optimisations, like effective multi-pipelining of concurrent instruction streams or instruction set extensions, are often lacking even from the best compilers.

It’s (a very unfortunate) practicality thing… usually people will compile to target i686 or even earlier architectures to ensure backwards compatibility. Very seldom do you see someone custom-roll a bleeeding-edge compile for the latest-and-greatest architectural advances. Perhaps more common in the sciences, but not very common in consumer software. AFAIK Intel’s compiler is the best when it comes to stuff like this (only logical), but I haven’t had the need to investigate this a lot.

It comes down to an understanding of what the code does at the CPU level, which understanding can help in eliminating a host of problems and inefficiencies before they occur.

A nice example of a bad problem is memory alignment issues. If a C++ developer doesn’t understand what the compiler is going to do with certain data structures, he’s gonna have code that works on his machine (probably by fluke), and crash badly on another platform, and he may have no idea why.

Permit me to derail a little more. (Maybe spawning a new thread is in order… ;D )

Much of what you say is true for most everyday applications. And of course there are the platform-independence and code maintenance issues.

You claim that IO ops are costly. This is true if your IO source/target is non-volatile storage. But many specialised scientific/engineering problems are amenable to being memory managed in such a way that misalignment wait-states, fragmented memory blocks and virtual memory access penalties are minimised or even eliminated altogether. This requires a detailed familiarity with the hardware architecture and the innards of the OS you’re coding for.

While on the subject of “good C++,” are you aware of how awfully expensive the object destructors and, especially, the constructors are? I assume you make regular use of structured exception handling too. Do you know what exception handling costs? These can be huge obstacles in badly-written numerical code.

There are certain types of specialised scientific/engineering problems whose numerical treatments are intensively repetitive, e.g. finite element, finite difference and optimisation problems. (For reference, you might like to consider that our Weather Bureau’s daily forecast run took between four and five hours on a Cray-2.) Often, one has a small set of core functions, each of which is called hundreds of millions or even billions of times during a solution run. These functions may themselves be iterative. Saving even a small percentage of the clock cycles these functions require can result in a significant saving in execution time. One admittedly extreme example resulted in a reduction from over two hours down to less than three minutes (and, as a bonus, the reworked solution engine was much less prone to pathological crashes/exceptions), but halving or even quartering the run time is not uncommon. Thus, you need to understand the nature of the problem you’re dealing with as well as the platform you’re going to solve it on.

I’m not saying that you have to write your entire program in highly optimised code. That would be a waste of much effort. However, where the nature of the problem warrants it, you should write the critical parts in carefully optimised code such as Assembler, and then link the assembled object code into the rest of your program using the high-level-language development environment (some of which allow you to make use of inline Assembler code). Experience has shown that the extra effort is worth it but it takes a practised eye to gauge this properly.

'Luthon64

so, ummm… How many programmers are on this forum??? :confused:

Professionally, I don’t program anymore although I do write task-specific codes when there is no easy way to use existing tools. I used to write such scientific and engineering codes in the manner I described earlier, though.

'Luthon64

I am, at the Joburg SITP meetings it’s like 80% people are in IT.
I’m not at that level that BM and Mefiante are, only ever written seriously in high level languages (mainly .Net). I can recognize Assembler and C++ when I see it, but might have some trouble reading it.

.Net usually “compiles” to MSIL, not native (x86) code. MSIL is a bit like Java’s byte code in that it requires a platform-specific runtime environment.

'Luthon64

;D

You claim that IO ops are costly. This is true if your IO source/target is non-volatile storage.

Maybe not on a Cray but it’s common to have PC’s or servers (and those sometimes stick around for a while) where the memory is 1/2 the clock speed of the CPU or less. Things are improving though.

But many specialised scientific/engineering problems are amenable to being memory managed in such a way that misalignment wait-states, fragmented memory blocks and virtual memory access penalties are minimised or even eliminated altogether. This requires a detailed familiarity with the hardware architecture and the innards of the OS you’re coding for.

Yeah I think this is where the crux of this discussion comes. Hand rolling assembly has application in specialised fields like this where you know your hardware and OS intimately, I completely agree. But for the “general purpouse guy” like me, even for server software, it’s usually about optimising DB access, network access, and the like. Not so much about cycles. So we’re just coming from 2 different applications. And my point is a bit that, the compiler writers cater to the application guys more, in my opinion.

While on the subject of “good C++,” are you aware of how awfully expensive the object destructors and, especially, the constructors are? I assume you make regular use of structured exception handling too. Do you know what exception handling costs?

Yes, and it depends. Constructor writers have to be prudent to use initialisation lists, and avoid assignments to mitigate costs. Usually you want your constructor body to be empty, the compiler can do initialisation lists quite efficiently. If constructors are very slow it’s probably because there’s got more stuff in the object than is needed, leading to objects being constructed needlessly, hinting maybe at a bad design decision. One could also be writing code that is creating too many temporaries, resulting in unwanted construction/destruction overhead… the liberal use of references can save one a good whack of unneeded temporaries.

Since we’re now completely off topic:

Constructor(std::string a) { this->a = a; }

is way worse than writing:

Constructor(const std::string& a): a(a) { }

And a lot of people don’t realise that. Yes I haven’t completely removed constructors, but you can remove a lot of them with some diligence. At the end of the day the memory would need to be initialized one way or the other, but it’s nice to only do it once. (Unless, as I suspect, in your domain you may have wanted to forego it because you know exactly how it’s going to be used, and program very carefully). I (actually we) don’t use exceptions regularly. We try to constrain those to TRULY “exceptional” cases (Network errors, OS errors…), for general error handling we avoid them, not only for performance, but certain guarantees are impossible to enforce when you allow exceptions.

BUT that is only what the language gives you. It’s possible to write a block memory allocator in pure C++ that performs. It’s a tricky task to make the compiler do a lot of the heavy lifting, usually involving “bending” the template language… but I know a person who did this, don’t know much about it though.

Oh and on destruction… this is a trade-off. It sounds like you were working with large areas of contiguous memory that could be nicely block-allocated and managed. But in our problem space what is needed is often unpredictable up front. So you have the option: Destruction, or Garbage collection. I prefer destruction because the de-allocation, or disconnect/cleanup code is going to happen anyway, albeit in a cascading fashion. I’d suspect though, that clean nested destructors devoid of unnecessary(not de-allocation) code could be optimised into a single de-allocate by the compiler. I’d have to read up though.

...Thus, you need to understand the nature of the problem you’re dealing with as well as the platform you’re going to solve it on.

Bingo, I do understand that in certain science disciplines this is entirely necessary. The shift seems to be going in the direction of offloading repetitive tasks like those onto GPU’s, which have their own languages and compilers to optimise their efficiency, and are very efficient at massively parallel processing (I hear).

I’m not saying that you have to write your entire program in highly optimised code. That would be a waste of much effort. However, [u][i]where the nature of the problem warrants it[/i][/u], you should write the critical parts in carefully optimised code such as Assembler, and then link the assembled object code into the rest of your program using the high-level-language development environment (some of which allow you to make use of inline Assembler code).

For us cross-platform guys it’s not usually an option. The case would have to be really extreme for us to consider having specialised asm code for every architecture. And hence this discussion… horses for courses. But lemme let you in on a little secret: We DO have an atomic integer implemented in asm for every platform we support, it’s a shitload more efficient than locking when it’s applicable.

To my shock I realised, upon touching a windows compiler again the other day, that by default the new versions of MSVC++ compile to byte-code too! No kidding, I fire up the exe and .net launches in the background? what the hell? Luckily I found a way to turn it back to “good old fashioned” binary building. :confused:

But that’s just my point: If you properly understand the CPU’s prefetching & trace caching, pipelining and memory caching, and you schedule your instruction streams properly over the entire stretch of critical code, these clock penalties simply won’t apply at all because there’s nothing the CPU has to wait for. Everything’s in the queue/cache already by the time it is needed. Put another way, upgrading to faster RAM won’t do much for the performance of such optimised code. (Of course, in practice it’s difficult to achieve 100% CPU efficiency, but 95%+ is doable.)

The codes I’m referring to are normally highly specialised and are used by a small fraternity only. The programmer knows what the target hardware is (or s/he is in a position specify it). It would be a different matter if wide platform coverage was required but this isn’t usually so. Also, code optimisations that work well for one generation of hardware are often well optimised for subsequent generations. The standout exception to this was Intel’s early P4 CPU. It ran P-III optimal code slower than the P-III, even at 50% higher clock speed. At around that time, AMD bit a sizeable chunk out of Intel’s pie.

Another point to note is that there is a certain class of problems that cannot be efficiently parallelised. That is, throwing a problem of this kind at a computing cluster won’t give appreciably faster results than running it on an autonomous single-CPU box of the same hardware configuration. So once again the message is that you need to understand the nature of the problem you’re dealing with.

Agreed in all respects, except to reiterate that even for general-purpose programming, it’s still a real benefit if the programmer understands what the high-level code’s going to be doing on the CPU.

One of the most horrendous bits of OOP I’ve ever seen was actually done by a computer science graduate. In one function that was typically called many millions of times, there was a “while” loop. Within the scope of this “while” construct, an object of a certain class was instantiated for the sake of one method that was needed, and then dutifully destroyed again. The object in question wasn’t used anywhere else in the code, either directly or via descendant classes. When you encounter something so obviously inept, you have to wonder how on earth the offender managed to graduate.

'Luthon64

Anyone for COBOL? :wink:

Like Mandarb, I’m not at the level of assembler ( way too lazy ) and I never got my head around C or C++, but I have made my living programming in BASIC ( post-COBOL LOL! ), contract programming and selling shareware, way back when. Now just hobby projects for me and friends.

I still use Fortran for much of this.

In primary school, I used to make a tiny unconvincing turtle cross the screen, turn left and go beep.
But with the arrival of GW-BASIC and the XT somewhere in the 80’s, the sky became the limit. The jewel in the crown of my programming career was a sequence that caused the computer to break wind when switched on. Ah … those were the days! ::slight_smile:

Mintaka

That’s fine if you’re happy with standalone console applications and hand-prepped input data files (where applicable). As with every high-level language, FORTRAN compilers do not necessarily produce optimal code.

If it’s your aim to integrate FORTRAN object code into a program much of which is written in another high-level language, there are some problems concerning calling conventions and parameter passing. Moreover, if your code passes matrices (2-D arrays) between a different high-level language and FORTRAN, you are inviting disaster. In FORTRAN, matrices are column dominant, whereas they are row-dominant in all other high-level languages. This means that you have to reorder your matrices explicitly both before and after the FORTRAN code operates on them. Many an unwary scientific programmer has come a totally baffled cropper on this subtle point.

'Luthon64

Sjoe, thanks but no thanks, FORTRAN and Perl serve my needs just fine. Running symplectic integrators overnight on the server is “good enough”. Injecting FORTRAN into what? C++, urrgg, heaven forbid :wink: Assembler into PASCAL however is fun, remember the good ol’ days of 1k - 4k demo comps ?

The last I’ve heard on the subject (I’ve been out of this game for several years now), there was the 64k Challenge where competitors (individuals and teams) effectively got locked into a hotel room for a whole week together with some PCs bristling with low-level tools and assorted programming environments. At the end of the week they had to produce a code image of 64 kB or smaller. Code compactors/decompressors are OK as long as they are coded by the programmer(s). The slickest app would win quite a prestigious prize. Rumour has it that the standard diet for the competitors is pizzas and pancakes: it’s the only food they could slip under the door… :wink:

Then, for the diehard DOS/Real Mode gurus, there was the 256 byte challenge. I don’t know if that one’s still going but the objective was to produce the coolest app with a code image of 256 bytes or less (inevitably, a .COM file). Some of the things these guys did in that tiny space is simply amazing. For comparison, the smallest possible MS Windows GUI program is 512 bytes in size, and all it does is something trivial like pop up a “Hello World!” message box. It needs to be assembled and linked by hand (which is a fancy way of saying that you need to write out the binary by hand) because the smallest PE section boundary that linkers recognise is 1 kB (= 1,024 bytes).

'Luthon64

I see there are javascript demo comps these days. :slight_smile: Not very impressive though, but I guess it is js.

Check this out, 1k demos still going strong (turn up your volume, requires sound)

Tracie by TBC

Hooray for COBOL and FORTRAN! Wasn’t COBOL designed for business/finance related problems?

Interestingly (at least, for me) there is lots and lots and lots of fresh and legacy scientific code out there that is still written in FORTRAN - even the FORTRAN-4 column dependent kind.

I had a bit of a nightmare some years ago trying to use “Digital” FORTRAN that promised to allow me to make nice, Windows user interfaces. Big disaster! Had to write endless code just to create the usual “Hello world” app. I gave that up as a joke and then tried Visual Basic for the interface passing parameters to a FORTRAN DLL to do the number crunching. Yuk yuk yuk.

After a few months of hair-pulling my S/O introduced me to Delphi. I first tried the parameter passing thing but bashed my head against the row vs column dominant problem, so switched into Delphi completely. Delphi has been great and allows me to paddle in the shallow and deep-end of the pool. But I miss “Numerical recipes”…

I used to be (database applications), but now I run a gym.

One of the most horrendous bits of OOP I’ve ever seen was actually done by a computer science graduate. ..... you have to wonder how on earth the offender managed to graduate.

Welcome to the headache that is developer hiring. A degree don’t make a programmer, no ma’am. The world over industry is left disillusioned by what is churned out by universities. Somehow I always get the feeling that varsity professors have NO IDEA what industry wants and needs. The cynic in me thinks they don’t care, they’re teaching SCIENCE damnit! Not craft!

The point is if you can code your way to getting the correct result for an assignment you get a passing mark and move on, no matter how horrendous your code is (as a tutor I once got rapped over the knuckles for being “too hard” on the student’s projects).

In fact it’s the craft that is sought after, a varsity that achieves it will have awesome post-degree hiring rates at higher salaries, guaranteed.

Quite. I have an arts degree (English & Philosophy) and a Master Mariner’s certificate. When I wanted to go into IT I went to Van Zyl & Pritchard (are they still going?) and did a COBOL course. I’ve never used COBOL in anger; my first job was Informix 4gl, since then I’ve used Progress, Python
and Oracle too. A six month course that concentrates on the practical aspects of programming trumps a CS degree unless you want to go and work in Intel’s R&D department or something similar. Businesses want to send their customers invoices, they don’t want to ‘model the real world’.