Profiling LMC Assembly Code

What is Profiling?

Profiling is measuring the performance of code, whether it is a small function or an entire application. You can determine how quickly the code will execute various requests and tune them to meet your needs.

Cycle Counts

The Little Man Computer is easy to do math on, it is 1:1 — one cycle is one operation (though, if you look at the underlying code that performs those functions, some take more steps than others). In a real computer, something may take several cycles to perform.

If you have a 1MHz processor, you’re probably quite old. I know, I have a MOS 6502. That’s 1 million cycles a second clock speed — but if one of your instructions takes two cycles (or more) then that clock speed is less useful for figuring out how fast a computer is — but can really help you tune your program.

For the sake of this discussion, all LMC clock cycles and instruction cycles are the same thing, and each take exactly the same amount of time*.

*Nerd stuff: This isn’t accurate, the code is ran procedural. A real computer would have some sort of clock generator (usually a crystal oscillator) that is used to time everything, and that clock trigger impulse tells the CPU to iterate the Program Counter and move onto the next step. In this case, the program counter is tied to the operations we’re doing (adding, reading, subtracting, etc) and there is minor variability in these steps.

Let’s do Math!

You might remember the OpCode table, where the only math we can do on an LMC is add and subtract — you might need do occasionally do multiplication — but how?

It’s quite easy! Let’s start by doing 2 * 2:

2 + 2 = 4

That seems like it would be easy, lets just go ahead and take the first number and multiply it by the second!

Let’s do 20 * 2:

20 + 20 = 40

Let’s do 2 * 20:

2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 = 40

Wow… Same exact question, same answer, very different results. So, what is the best approach? Let’s build a simple multiplication program and then tune it.

// Multiply Two Numbers, Slowly - R.Lerner 2023/10/28
// Get Numbers to Multiply
	INP
	STA a
	INP
	STA b
	LDA a
		
// Store Ouput Number
	STA output
		
// Subtract 1 from "b", so 5x5 is now 5x4, and the output = 5
	LDA b
	SUB one
	STA b
		
// Since there's no multiplication operand, we have to loop ANDs
MULT	BRZ DONE// If zero is in var B jump to done
	LDA output
	ADD a
	STA output
	LDA b 	// Set accum. to "b" for MULT / BRZ compare
	SUB one
	STA b
	BRA MULT// Jump back to multiply loop until b is at zero.

// Done doing math, output
DONE	LDA output
	OUT
	HLT
	a DAT
	b DAT
	output DAT
	one DAT 1

I know all of you folks that use higher level languages are saying “gross, I can just do var x = 2*20; omg.

Yes, you can, but your computer is still doing what I show above and somebody wrote that at some point.

Let’s look at how many operations things take here:

Multiplication	Cycles Required
1*10	85
10*1	13 **
102*	21 ***
5*100	805
100*5	45
20*20	400

** – For brevity, here’s what is happening with 10*1 during those 13 cycles:

(Entered “10” on keyboard)

1: [10 / +] Read Input to accumulator
2: [10 / +] Setting RAM location ’20’ to accumulator’s value

(Entered “1” on keyboard)
3: [1 / +] Read Input to accumulator
4: [1 / +] Setting RAM location ’21’ to accumulator’s value
5: [1 / +] Load value at location 20 into the accumulator
6: [10 / +] Setting RAM location ’22’ to accumulator’s value
7: [10 / +] Load value at location 21 into the accumulator
8: [1 / +] Subtracting Value 1 from 23 from the accumulator
9: [0 / +] Setting RAM location ’21’ to accumulator’s value
17: [0 / +] Break if Accumulator = Zero to location 17
18: [0 / +] Load value at location 22 into the accumulator
19: [10 / +] Add accumulator to Output
20: [10 / +] Halting
20: [10 / +] 13 total instruction(s).

*** – How about 10*2? I scratched off the items that are in 10*1:

(Entered “10” on keyboard)
1: [10 / +] Read Input to accumulator
2: [10 / +] Setting RAM location ’20’ to accumulator’s value
(Entered “2” on keyboard)
3: [2 / +] Read Input to accumulator
4: [2 / +] Setting RAM location ’21’ to accumulator’s value
5: [2 / +] Load value at location 20 into the accumulator
6: [10 / +] Setting RAM location ’22’ to accumulator’s value
7: [10 / +] Load value at location 21 into the accumulator
8: [2 / +] Subtracting Value 1 from 23 from the accumulator
9: [1 / +] Setting RAM location ’21’ to accumulator’s value
10: [1 / +] Break if Accumulator = Zero to location 17
11: [1 / +] Load value at location 22 into the accumulator
12: [10 / +] Adding Value 10 from 20 to accumulator
13: [20 / +] Setting RAM location ’22’ to accumulator’s value
14: [20 / +] Load value at location 21 into the accumulator
15: [1 / +] Subtracting Value 1 from 23 from the accumulator
16: [0 / +] Setting RAM location ’21’ to accumulator’s value
17: [0 / +] Jump to 09
17: [0 / +] Break if Accumulator = Zero to location 17
18: [0 / +] Load value at location 22 into the accumulator
19: [20 / +] Add accumulator to Output
20: [20 / +] Halting
20: [20 / +] 21 total instruction(s).

What changed between 10*1 and 10*2?

Like we mentioned above, we have to do things differently:

10 * 1 is just “10”

10 * 2 is 10+10

In this case, I took care to put the larger number first, so the jump from 10*1 to 10*2 was negligible, but since we have to ADD each number, we have to loop over and over again until one of these numbers are zero.

Here’s another table to give you an idea:

Anything multiplied by:	Cycle Count
* 0	21
* 1	13
* 2	21
* 3	29
* 4	37
* 5	45

You can see, after *0, each time we jump 8 more instructions. For this reason, larger numbers will take more work to multiply if they’re the second number used.

Now this would be fine, but users aren’t always that quick and don’t read manuals. Because your code might be used by a company you have stock in, you want it to perform the best possible.

So that’s the end of profiling that code. Maybe it is time to write some new code?

Multiplication v2

Here’s some code I wrote over a year ago, it makes a few minor changes that make a massive difference:

If one of the numbers entered is a zero, the program returns zero and exits
If the second number provided is larger, it is swapped with the first number, and then the math is done

These changes mean that it is actually slower than the first example for people who know to put the large number first — but it’s also significantly faster than the first for all other use cases.

// Multiply Two Numbers - R.Lerner 2022/07/23 - Reduces cycle count by multiplying large numbers instead of small.

	INP
	STA a
	BRZ ZERO // If zero value, output zero
	INP
	STA b
	BRZ ZERO // If zero value, output zero
	SUB a
	BRP SWAPVAR // Branch to swap a & b
	BRA DOMATH
SWAPVAR	LDA b
	STA swap
	LDA a
	STA b
	LDA swap
	STA a
DOMATH	LDA a
	STA output
	LDA b
	SUB one
	STA b
MULT	BRZ DONE // If zero is in var B jump to done
	LDA output
	ADD a
	STA output
	LDA b	// Set accum. to "b" for MULT / BRZ compare
	SUB one
	STA b
	BRA MULT 	// Jump back to multiply loop
ZERO	OUT
	HLT
DONE	LDA output
	OUT
	HLT
	a DAT
	b DAT
	output DAT
	one DAT 1
	swap DAT

Multiplication	Cycles Required Version 2	Cycles Required Version 1
1*10	23	85
10*1	18	13
102*	21	26
5*100	55	805
100*5	50	45
20*20	175	165

You’ll notice that version 2 has DRAMATIC savings in some areas, but also takes longer because it has to perform all of the additional checks in order to accommodate the incorrect usage.

I consider this an overall success, unless you have a use case where you have:

Users that always RTFM, and actually follow it
Fault / risk tolerance for the times they forget or do not follow it
Critically short runtime

Comments

How can you improve it? I’d love to see some code samples in the comments!

Giveaway

I’m going to be reading the comments in this blog series, and when it concludes, I will reach out to the person with the best comment or question. I have one copy of the below book I’ll send out to that person if they will share their address with me. It’s not my book, I just have two of them and want to give one away to somebody who is interested.

Profiling LMC Assembly Code

Comments

Leave a Reply Cancel reply