What is Profiling?

Profiling is measuring the performance of code, whether it is a small function or an entire application. You can determine how quickly the code will execute various requests and tune them to meet your needs.

Cycle Counts

The Little Man Computer is easy to do math on, it is 1:1 — one cycle is one operation (though, if you look at the underlying code that performs those functions, some take more steps than others). In a real computer, something may take several cycles to perform.

If you have a 1MHz processor, you’re probably quite old. I know, I have a MOS 6502. That’s 1 million cycles a second clock speed — but if one of your instructions takes two cycles (or more) then that clock speed is less useful for figuring out how fast a computer is — but can really help you tune your program.

For the sake of this discussion, all LMC clock cycles and instruction cycles are the same thing, and each take exactly the same amount of time*.

*Nerd stuff: This isn’t accurate, the code is ran procedural. A real computer would have some sort of clock generator (usually a crystal oscillator) that is used to time everything, and that clock trigger impulse tells the CPU to iterate the Program Counter and move onto the next step. In this case, the program counter is tied to the operations we’re doing (adding, reading, subtracting, etc) and there is minor variability in these steps.

Let’s do Math!

You might remember the OpCode table, where the *only* math we can do on an LMC is add and subtract — you might need do occasionally do multiplication — but how?

It’s quite easy! Let’s start by doing 2 * 2:

2 + 2 = 4

That seems like it would be easy, lets just go ahead and take the first number and multiply it by the second!

**Let’s do 20 * 2:**

20 + 20 = 40

**Let’s do 2 * 20:**

2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 = 40

Wow… Same exact question, same answer, very different results. So, what is the best approach? Let’s build a simple multiplication program and then tune it.

```
// Multiply Two Numbers, Slowly - R.Lerner 2023/10/28
// Get Numbers to Multiply
```**INP
STA a
INP
STA b
LDA a**
// Store Ouput Number
**STA output**
// Subtract 1 from "b", so 5x5 is now 5x4, and the output = 5
**LDA b
SUB one
STA b**
// Since there's no multiplication operand, we have to loop ANDs
**MULT BRZ DONE**// If zero is in var B jump to done
**LDA output
ADD a
STA output**
** LDA b** // Set accum. to "b" for MULT / BRZ compare
** SUB one**
** STA b**
** BRA MULT**// Jump back to multiply loop until b is at zero.
// Done doing math, output
**DONE LDA output
OUT
HLT
a DAT
b DAT
output DAT
one DAT 1**

I know all of you folks that use higher level languages are saying “gross, I can just do

omg.**var x = 2*20;**

Yes, you can, but your *computer* is still doing what I show above and somebody wrote that at some point.

Let’s look at how many operations things take here:

Multiplication | Cycles Required |

1*10 | 85 |

10*1 | 13 ** |

10*2 | 21 *** |

5*100 | 805 |

100*5 | 45 |

20*20 | 400 |

** – For brevity, here’s what is happening with 10*1 during those 13 cycles:

(Entered “10” on keyboard)

1: [10 / +] Read Input to accumulator

2: [10 / +] Setting RAM location ’20’ to accumulator’s value

(Entered “1” on keyboard)

3: [1 / +] Read Input to accumulator

4: [1 / +] Setting RAM location ’21’ to accumulator’s value

5: [1 / +] Load value at location 20 into the accumulator

6: [10 / +] Setting RAM location ’22’ to accumulator’s value

7: [10 / +] Load value at location 21 into the accumulator

8: [1 / +] Subtracting Value 1 from 23 from the accumulator

9: [0 / +] Setting RAM location ’21’ to accumulator’s value

17: [0 / +] Break if Accumulator = Zero to location 17

18: [0 / +] Load value at location 22 into the accumulator

19: [10 / +] Add accumulator to Output

20: [10 / +] Halting**20: [10 / +] 13 total instruction(s).**

*** – How about 10*2? I scratched off the items that are in 10*1:

~~(Entered “10” on keyboard)~~

1: [10 / +] Read Input to accumulator

2: [10 / +] Setting RAM location ’20’ to accumulator’s value

(Entered “2” on keyboard)

3: [2 / +] Read Input to accumulator

4: [2 / +] Setting RAM location ’21’ to accumulator’s value

5: [2 / +] Load value at location 20 into the accumulator

6: [10 / +] Setting RAM location ’22’ to accumulator’s value

7: [10 / +] Load value at location 21 into the accumulator

8: [2 / +] Subtracting Value 1 from 23 from the accumulator

9: [1 / +] Setting RAM location ’21’ to accumulator’s value

10: [1 / +] Break if Accumulator = Zero to location 17

11: [1 / +] Load value at location 22 into the accumulator

12: [10 / +] Adding Value 10 from 20 to accumulator

13: [20 / +] Setting RAM location ’22’ to accumulator’s value

14: [20 / +] Load value at location 21 into the accumulator

15: [1 / +] Subtracting Value 1 from 23 from the accumulator

16: [0 / +] Setting RAM location ’21’ to accumulator’s value

17: [0 / +] Jump to 09

17: [0 / +] Break if Accumulator = Zero to location 17

18: [0 / +] Load value at location 22 into the accumula~~tor~~

19: [20 / +] Add accumulator to Output

20: [20 / +] Halting**20: [20 / +] 21 total instruction(s).**

What changed between 10*1 and 10*2?

Like we mentioned above, we have to do things differently:

10 * 1 is just “10”

10 * 2 is 10+10

In this case, I took care to put the larger number first, so the jump from 10*1 to 10*2 was negligible, but since we have to ADD each number, we have to loop over and over again until one of these numbers are zero.

Here’s another table to give you an idea:

Anything multiplied by: | Cycle Count |

* 0 | 21 |

* 1 | 13 |

* 2 | 21 |

* 3 | 29 |

* 4 | 37 |

* 5 | 45 |

Now this would be fine, but users aren’t always that quick and don’t read manuals. Because your code might be used by a company you have stock in, you want it to perform the best possible.

So that’s the end of profiling that code. Maybe it is time to write some new code?

Multiplication v2

Here’s some code I wrote over a year ago, it makes a few minor changes that make a massive difference:

- If one of the numbers entered is a zero, the program returns zero and exits
- If the second number provided is larger, it is swapped with the first number, and then the math is done

These changes mean that it is actually *slower* than the first example for people who know to put the large number first — but it’s also significantly faster than the first for all other use cases.

```
// Multiply Two Numbers - R.Lerner 2022/07/23 - Reduces cycle count by multiplying large numbers instead of small.
```**INP
STA a
BRZ ZERO** // If zero value, output zero
**INP
STA b
BRZ ZERO** // If zero value, output zero
**SUB a
BRP SWAPVAR** // Branch to swap a & b
**BRA DOMATH
SWAPVAR LDA b
STA swap
LDA a
STA b
LDA swap
STA a
DOMATH LDA a
STA output
LDA b
SUB one
STA b
MULT BRZ DONE** // If zero is in var B jump to done
**LDA output
ADD a
STA output
LDA b **// Set accum. to "b" for MULT / BRZ compare
**SUB one
STA b
BRA MULT** // Jump back to multiply loop
**ZERO OUT
HLT
DONE LDA output
OUT
HLT
a DAT
b DAT
output DAT
one DAT 1
swap DAT**

Multiplication | Cycles RequiredVersion 2 | Cycles RequiredVersion 1 |

1*10 | 23 | 85 |

10*1 | 18 | 13 |

10*2 | 21 | 26 |

5*100 | 55 | 805 |

100*5 | 50 | 45 |

20*20 | 175 | 165 |

I consider this an overall success, unless you have a use case where you have:

- Users that always RTFM, and actually follow it
- Fault / risk tolerance for the times they forget or do not follow it
- Critically short runtime

Comments

How can you improve it? I’d love to see some code samples in the comments!

Giveaway

I’m going to be reading the comments in this blog series, and when it concludes, I will reach out to the person with the best comment or question. I have one copy of the below book I’ll send out to that person if they will share their address with me. It’s not my book, I just have two of them and want to give one away to somebody who is interested.

## Leave a Reply