Everyone knows, to shift right a 24 bits variable (3 bytes) it needs 3 instructions per bit shifted, if doing straight without using loop.

per bit:

**ASR ByteH **; if signed, or **LSR** if unsigned

**ROR ByteM**

**ROR ByteL**

When shifting 6 bits, it will use 18 clock cycles, if this is part of a repeating routine, it can easily eat time.

Using reciprocal multiplication can save some cycles.

Shift right ABC (24 bits) "n" bits:

The same as ABC x 2^(8-n), then ignore LSByte, result into ABC.

For "n" 4~7 it saves clock cycles.

If ABC is signed and negative, needs to pad resulting "A" n MSbits with "1", not showing below.

If n=4, D=0x10

If n=5, D=0x08

If n=6, D=0x04

** A B C
x D
--- --- --- ---
CDH /
ADH ADL /
BDH BDL /
--- --- --- ---
A B C /**

Option A (10 clock cycles), considering variable Zero = 0:

MUL A, D ; MOV A, R1 ; ADH MoV X, R0 ; ADL MUL C, D ; MOV C, R1 ; CDH MUL B, D MOV B, X ; ADL ADD C, R0 ; CDH + BDL ADC B, R1 ; ADL + BDH + cy ADC A, ZERO ; ADH + cy

Option B, just changing sequence, same clock cycles.

MUL C, D ; MOV C, R1 ; CDH MUL A, D ; MOV A, R1 ; ADH MOV X, R0 ; ADL MUL B, D ; MOV B, X ; ADL ADD C, R0 ; CDH + BDL ADC B, R1 ; ADL + BDH + cy ADC A, ZERO ; ADH + cy

Any other ideas for less than 10 clock cycles?

The example above will run in 10 clock cycles no matter the "n" for unsigned.

In the normal shifting fashion, 6 bits will eat 18 clock cycles for unsigned (LSR) or signed (ASR).