Can a x86/x64 cpu/memory system be changed into a barrel processor ?
I shall provide an idea here and then you guys figure out if it would be possible or not.
What I would want as a programmer is something like the following:
- Request memory contents/addresses with an instruction which does not block, for example:
Then it should be possible to "machine gun" these requests like so:
EnqueueReadRequest address1 EnqueueReadRequest address2 EnqueueReadRequest address3 EnqueueReadRequest address4 EnqueueReadRequest address5
- Block on response queue and get memory contents
do something with register1, perhaps enqueue another read request
DequeueReadResponse register2 DequeueReadResponse register3
If the queues act in order... then this would be sufficient.
Otherwise extra information would be necessary to know which is what.
So if queues would be out of order then the dequeue would need to provide which address the contents where for.
DeQueueReadResponse content_register1, address_register2
The same would be done for writing as well:
EnqueueWriteRequest address1, content_register EnqueueWriteRequest address2, content_register EnqueueWriteRequest address3, content_register
There could then also be a response queue which notifies the thread when certain memory addresses where written.
DequeueWriteResponse register1 (in order design)
DequeueWriteResponse content_register1, address_register2 (out of order design)
There could also be some special instructions which would return queue status without blocking...
Like queue empty count, queue full count, queue max count and perhaps a queue up count which could be used to change queue status in case something happened to the queue.
For example each queue has a maximum ammount of entries available.
The queueing/dequeuing instructions mentioned above would block until they succeed (meaning their request is placed on queue or response removed from queue)
The counting instructions would not block.
This way the cpu would have 4 queues at least:
- Read Request Queue
- Read Response Queue
- Write Request Queue
- Write Response Queue
Each queue would have a certain maximum size.
Each queue has counters to indicate how much "free entries there are" and how much "taken entries there are".
For example, these are also querieable via instructions and do not block the thread, the counters are protected via hardware mutexes or so because of queieing and dequeing but as long as nothing is happening these counters should be able to return properly.
GetReadRequestQueueEmptyCount register GetReadRequestQueueFullCount register
GetReadResponseQueueEmptyCount register GetReadResponseQueueFillCount register
GetWriteRequestQueueEmptyCount register GetWriteRequestQueueFullCount register
GetWriteResponseQueueEmptyCount register GetWriteResponseQueueFillCount register
All instructions should be shareable by threads... so that for example one thread might be postings read requests and another thread might be retrieving those read responses.
Otherwise the first thread might block because of read request full, and nobody responding to response queue.
Alternatively perhaps the instructions could also be made non-blocking, and return a status code to indicate if they operation succeeded or not, however then an additional code or mode would also be necessary to specify if it should be blocking or non-blocking... which might make things a bit too complex, but this is hardware-maker decision... in case many threads sharing is too difficult or impossible or too slow then non-blocking might be better, the thread can then cycle around read responses and see if anything came in so it can do something... however this would lead to high cpu usage... so for efficiency sake blocking is preferred, or perhaps a context switch until the thread no longer blocks. It would then still be necessary for the thread to somehow deal with responses... so this this seem to need multiple threads to work together for the blocking situation.
The memory system/chips would probably also need some modifications to be able to deal with these memory requests and return responses.
Perhaps also special wiring/protocols to be able to "pipeline"/"transfer as much of these requests/responses back and forth.
So what you think of a "barrel" like addition to current amd/intel x86/x64 cpu's and there memory systems ?!? Possible or not ?!?
This idea described above is a bit messy... but it's the idea that counts... if cpu manufacturers interested I might work it out some more to see how it would flesh out/work exactly ;)