LATTUR

 

A High Performance RTOS Specification

 

-Karthikeyan Samynathan-

 

Comments to: karthiksamynathan@yahoo.com

 

 

 


INTRODUCTION

 

Lattur is a multitasked, pre-emptive RTOS that has provides and implements the following features.

 

Lattur’s kernel has real time pre-emption and scheduling. The scheduler’s run queue is searched in each timer interrupt. If there is a process of higher priority that has become ready to run, then the dispatcher interrupt is raised at a level lower than that of the timer interrupt, and the timer interrupt returns. When the dispatch interrupt is serviced the new task is dispatched.

 

Task based execution Model. There would be a single task running at any time.

 

The entire OS would be running in Kernel Mode or any other equivalent mode in any processor architecture it may be ported to.

 

Lattur consists of Structured Exception Handling. There will be definite interrupt levels at which the processor may go to in occurrence of an interrupt.

 

Lattur’s memory management is structured too. It contains support for dynamic memory management and corruption detection. Debug aids and quick memory pools are also supported.

 

There is also a supported structured Device management strategy to implement portable and easily configurable Device drivers. One major requirement in developing device driver architecture is, the degree of complexity involved in porting existing Vxworks drivers. All device drivers should be properly described in the OS. All calls to devices will be done to the OS' API's and the OS takes the charge of routing the requests. No layerings of device drivers are allowed.  Only the bare device driving is to be taken care of the device driver. The rest of the information handling should be done by Tasks in the OS. This saves a lot of complexity and time for flowing through the driver stack with the help of the OS.

 

System time management and scalable timer constructs. The timer data structure used by tasks would be scalable for large number of timers. The OS should not lose performance when

   The number of timer to manage is too high!!

 

Lattur also defines constructs for Inter Process Communication and synchronization. Some of the supported IPC mechanisms would be.

 

. Event queues

. Mail Boxes

. Semaphores

. Event timers

. Signals

 

  More will be added to this list as requirements are seen.

 

Has integrated terminal support and a simple and flexible CLI to interact with users through the terminal.

 

Lattur also supports logging messages with associated error levels.

 

Initially, Lattur will be capable of running on MIPS platform.

 


ARCHITECTURE

 

For portability sake the Lattur kernel will be implemented as two parts. One will be the platform independent part and another the platform dependent part. The former is called the kernel and the latter the HAL (Hardware Abstraction Layer). This is done so that when a port to a new platform is required. We need not change arbitrary parts of the kernel to make it to work. It would be enough if we provide the HAL interface API’s to the kernel and implement them alone, all the kernel code could be reused and is already tested.

 

But to implement this sort of architecture, a stringent coding process is to be adopted. And the kernel as well as the applications should not assume anything about the platform or the HAL. All application requests should be directed to the Kernel who then calls the appropriate HAL API. This would isolate the HAL dependencies off the applications, and would provide a clean environment to port the kernel over other platforms.

 

Diagrammatically the architecture may be represented as.

 

 

 

 

 


Kernel Layer (OsK)

 

 
Task API’s

 

 

 

 

                                            Well Defined APIs

 

 

 

 

 

 

 

 

 

 


                                           Architecture of an easily portable kernel.

 

 

 

 

 

 

The kernel of Lattur would provide the following interface to the Applications and the HAL

 

 The generic structure of an HAL layer API provided by the kernel would be,

 

_Os_class_routine ()

 

An application is only expected to call the entry points into the kernel. They should not assume anything of the HAL or call directly the HAL entry points.

 

Similarly on the HAL, the entry points exported contains the general structure of, __hal_class_service () such as __hal_exception_init ()

 

 

Model Of a Lattur Task

 

   A task in lattur is a unit or in other words a thread of execution of the OS. In fact In an RTOS has all the tasks' code loaded into the RAM regardless of when and how much time the task would be executed. This is done so that we do not have the overhead of allocating space dynamically, for dynamic loading of tasks, reading tasks in, dumping them out and maintaining associated data structures.

 

 

  So all the tasks in an RTOS are kept intact in the RAM, and are used when they are needed. One more advantage of this approach is that, we need not maintain page tables since there is only one process running thought many threads (tasks) can be multitasked. So the TLB (translation Look Aside buffer), which may be supported in some platforms for doing page translation without the help of software by caching recently used pages on chip in the CPU may be setup statically, no page misses may occur. This actually helps us from walking away the overhead of maintaining page tables and handling page faults. In fact, if the target architecture provides a piece of un-mapped memory (memory that is not translated through the TLB), the RTOS should consider it as a privilege to live with.

 

    Now since in an RTOS a task is just a thread of execution, the components that make up a task are easy to distinguish. Each task has a execution set and associated data (stack, heap, BSS). Fortunately since all the tasks are going to live together in a single image, the BSS and the static data sections are shared by all the tasks. The heap is a global data area where the tasks can do Dynamic memory allocation through the OsK. The entire heap library is built into the kernel and the HAL does not interfere with this scheme.

 

  The code for all the tasks or the execution set is also shared, and is free for use. By this, I mean that a single execution set is not fixed for a particular task. It is absolutely possible to share code between two tasks.

 

  So we would have a task as a simple 'ANSI C' function, which runs in an infinite while loop and does some repetitive job. The format of a task would be such as...

 

 

task SIMPLE_TASK (task_argument_set args) {

 

setup wait queues and events;

 

while(1) {

 

sleep on queue or event();  // We reach this point if we woke up from

                                            // previous sleep.

if (we_woke_up_from_signal) {

        do work for the particular signal;

        if kill signal Do cleanup and return

 }

Get_event_from_os();

Process_event();

Send_events_to_other_tasks();

} // Continue sleep

 

}

 

 

So a task has the following context...

 

1. Register Context

2. Stack

3. Event sets and queues.

4. Signal handlers.

 

   Each task is a different thread of execution. This makes the point evident that there should be a separate stack and register context for each task. The even sets and queues are those mechanisms that the task creates and manages and uses to receive information from another task or the kernel.

 

   Any task can be delivered a signal, for which the kernel has a default handler. For some signals the task can have its own signal handlers. Any task that is sleeping and has received a signal would be woke up from its sleep. Generally the stack that the task runs will have a function called scavenger() at the end. This would decay the task, by returning the OsK components that were used on this task, including the stack and the control blocks.

 

  Each task can be in the following states.

 

1. Run

2. Sleep

3. Ready to Run

4. Dead

To create a task the user has to call an add_task routine that would add and declare a new task to the OsK. Then on the task will be eligible for execution, according to the scheduling algorithm.

 

  During the execution period the task will be pre-empted, to give time for other tasks of equal or high priority. The task can also voluntarily suspend by blocking on an event, waiting for something to happen. In this case the task is put to sleep. When the event fires, it wakes up the tasks waiting based on a policy with which the event was instantiated. This policy may be to wakeup the longest waiting task or the highest priority task waiting on the event, or all the tasks waiting on the event.

 

   Once the task is woken up, it is subjected to the normal scheduling algorithm to get scheduled again. If the task has the priorty_sleep bit set in the flags, it is put in the queue after the current executing task. This will make sure that this task would execute in another two quantum, provided that other higher priority tasks are not awake.

 

    Once a task decides that it has done with whatever is required, it may just return, so that the scavenger buried in the stack at the bottom would return all the resources and kill the task. The task is no more present. All events directed to this task will die.

 

    Tasks communicate with each other through IPC mechanisms that are built over the event model. For more information refer to the event model for Lattur.

 

    While creation the task has to specify the size of the stack, the initial argument to be passed to the task, and the entry point of the task. The OsK will create a stack of requested size initialize the entire stack to a magic pattern that is not zero and make it the task’s stack.  The magic pattern in an RTOS helps many purposes, it helps debugging, it serves as a visual aid for locating structures, it also helps when some code misbehaves using the magic locations wrongly as an address, by taking an unaligned exception, ‘cause the magic numbers are all selected to be unaligned numbers.

 

   Stack checking includes one of the checking mechanisms for checking the task integrity. Each task has a fencepost at the beginning and the end. A periodic process checks the integrity of the fence posts periodically making sure that the task does not overflow corrupting memory out of the stack.

 

   A task could also ask for a stack growth, in case the task sees a requirement and thinks that it is running low on stack. This is also done in the background stack_check task, which along with the check of integrity of the fencepost also checks for the high water mark on the stack. If the high water mark is too high, the stack_check task tries to grow the task’s stack.


 

TASK SCHEDULING

 

 Scheduling in an RTOS is always done with the least effort – best response behavior. In an RTOS unlike the general purpose OS’ pre-emption not necessarily happen only at quantum expiry or task blocking, rather it could happen at each and every clock tick. This ensures that only the highest priority task is always running on the CPU at any given time.

 

  This also requires that the Kernel keeps on checking the scheduling queue on each and every interrupt. This is really a costly scheme, and has to be optimized so that the check is simple and does not span multiple cache lines.

 

 Some of the parameters that would be good to maintain are,

1.      The response time of the Kernel as a function of the tasks that are waiting on the same priority level and higher priority level.

2.      The Check time to find out if a higher priority task is waiting.

 

Both of these times should be reduced to the minimum.

 

Lattur supports both HRT and SRT systems. The HRT tasks are the highest priority tasks in the system. Which means that when there are HRT tasks ready to run in the system, no other SRT task could run. Within the HRT tasks we maintain a Earliest deadline first scheduling, where the deadline by default is the taken as,

 

 Default Deadline = 1/number_of_HRT tasks.

 

The User could also specify a different deadline while spawning the task. The algorithm will try to schedule the earliest deadline task first. In case there are collisions, we schedule such that the average response becomes high, by scheduling the task that has the most waiting time. If there is a clash in that too we select the task that has the most number of deadline misses. If there is still a collision then we make a decision based upon the quantum of the tasks, and provide the task with the smallest quantum. If no other method gives way we use a lottery scheduling and give the CPU to one of the tasks.

 

Each HRT task has an associated quantum too. This quantum is the time the Kernel would promise the CPU once the TASK is put onto the CPU.  A HRT task is removed off the CPU only at the end of the quantum associated with it, or after the task blocks on any output. On blocking the HRT task could nominate a temporary deadline, which will be obeyed by the kernel once as soon as the task awakes, after which the kernel will start to use the older deadline. Just a sort of boosting on the HRT event that could happen and need a quick response.

 

One important thing is that, to have a perfect HRT system, the kernel support is only a infrastructure, good response times will show up only if the user who creates the tasks does a proper study of the timing properties and relations of the task and assigns deadlines and quantum’s accordingly.

 

Lattur also has another scheduling class called the Real time scheduling class. Which provides soft real time responses to the tasks registered. This class has 5 different priorities. And the scheduling algorithm is the highest priority first scheduling. If tasks of the same priority are waiting then a round robin kind of scheduling is adopted within the same priority. As long as there is a higher priority task waiting, a lower priority task waiting to get scheduled is not scheduled.

 

In this class only the highest priority task gets scheduled, all other tasks wait until the higher priority tasks are blocked. The kernel does not guarantee any timing properties for scheduling in this class. Assigning a task which does not block and all the time is ready to run, all other lower priority tasks will not get a chance to run.  This is a typical priority-masking problem. So even in this case of scheduling, the user has to take care so about which tasks are allocated what priority.


 

Lattur Memory Model

 

Lattur supports dynamic memory management, inclusive of bounded and non-bounded allocation time pools. The important tasks allocated to the memory manager are. Net pools are also supported for network packet management. These pools would be trimmed and specially designed to handle network packets. The format will be similar to the BSD type with the packet descriptor and the packet separate from each other.

 

For the time being we will not discuss more about the packet pools and their management, this is to be dealt with after the design of the Lattur Driver Architecture is completed. For the time being we will just discuss the Lattur Dynamic memory Management.

 

The major works of the dynamic memory manager are,

 

·          Manage the Dynamic Memory of the OS

·          Locate corruption if anything occurs in the memory pools.

·          Provide Quick Malloc support for interrupt levels, having pre-allocated blocks in store

·          Provide Network Buffer interface according to the scheme selected for the protocol implementation.

 

 

1. Dynamic memory management.

 

The entire ram is initially divided into Regions. Based upon which the TLB is setup.

 

On init time the region that has to serve the dynamic memory requirements is used to create a data structure called the heap.

 

 This heap contains freelists of configurable size (configurable only at the time of creation of the heap).  Then the memory in the region is carved and put into the freelist at a single node where the size fits. Each freelist has an associated size and a double linked list that contains the free blocks for that region of that particular size range. All malloc () and free () code actually manipulate these free lists.

 

The search algorithm during a malloc is simple. Generally we search using the size of the requested block. We start the search from the freelist that has the size just higher than the requested size. If the free list does not have any free blocks, we walk up to higher sized free lists to find a free block. If no other block is free we return to the freelist just before the one that we started our search with.  When a block is found we carve the rest of the block over that requested size and put it in an appropriate free list.

 

Carving the block is done only if the leftover size is greater than the size of the free block type. Remember that we store the free block in the body of the block. Also when a block is returned to the heap, it is joined with its previous or next block or both if they are also on the freelist implying they are free. This feature would minimize fragmentation in case there is any. Fragmentation is caused because a number of small allocations cause a lot of small free blocks, and finally though there is enough memory for a large allocation there is no one block, which is of the required size.

 

The general structure of the block and the heap and their implementation are platform independent. These data structures and code, which manage them, can operate under any platform because they do not use any platform dependent constructs.

 

 

2. Detection of corruption and Aid for debugging

 

Corruption happens when some task writes outside of its allocated memory. There are two types of corruption, one is when a task overwrites through its block to the next block, another is when a task writes to a bogus location where an arbitrary block is present and gets corrupted. All of these problems are because of software errors, and need to be fixed. So corruption detection is an important part of any RTOS.

 

Corruption detection is divided into two parts.

 

 a. Synchronous corruption detection.

 

      Here the corruption is checked in the context of the thread that mallocs or frees the block. Hence this is called synchronous.

 

Corruption is detected by maintaining some overhead storage along with the blocks. Each block contains a magic number at the beginning. There is also Fencepost magic at the end. Therefore whenever the block is returned we check for the block overruns and under runs, by checking the magic’s at the beginning and the end. Also to aid more sanity checking each block has a reference count which should be > 0 when a block is returned.

 

A task can explicitly increase the recount by calling a OS routine block (block type *) in this case the block will not be freed until free () IS CALLED twice because the first time the block's recount would not fall to 0.

 

 

 

 

b. Asynchronous

 

      The requirement do an asynchronous checking is to make sure that there is no corruption made by a task after it has returned the block. This check is made by a background task, which runs at Normal priority and checks for corruptions. The checks that this Task makes is

 

. Check Block Magic and Fence Magic

. Check Free Magic and

. Integrity of the Free and the block Linked lists.

 

       The entire block is poisoned (stored with a poison Magic pattern value of 0x11011011 Please see Note 1. at bottom of page). This poisoning is not to help spot corruptions but is to help to debug in case a corruption is caught by any of the methods above. This is because checking of the entire block for the poison values might be tedious and a bad design rule for a RTOS.

 

Also to aid debugging, each block stores the allocating task's PC. And each freeblock stores the freeing task's PC. By chance if there is a corruption these values would help us to track down the problem.

 

When there is a Block corruption the following information is listed, along with other information the Os Logs as,

 

   . The Reason for the crash (what type of memory error or corruption).

   . Dump of the corrupted block, the next block and the previous block

 

3. Provide Quick Memory support

 

One fall off of the heap method is that the search and allocation algorithm are not time bounded, neither are they cheap in terms of processing time. This sees a need for a requirement for a more streamlined and time bounded algorithm. This is the reason that lattur supports a new strategy of memory allocation called quick memory pools.

 

This support is actually built over the heap infrastructure. The need for a quick memory is that we may not be able to do an extensive search in the heap for an allocation in interrupts, moreover sharing resources in interrupt and process level should be minimized to reduce complexity and dependencies. One more advantage of having this infrastructure is that we may not be able to pay the cost of a huge overhead of debugging information stored in the blocks for small allocation requests, if we go for a quick memory pool type infrastructure the overhead per block is really less.

 

 

 

The quick memory pool is a structure built over the heap manager. We allocate a big block in the beginning and then divide this block into smaller quick memory blocks, which have fewer overheads. The qm pool structure has an array of pointers to the quick memory locations inside this master block. So whenever some task asks for memory the qm manager uses the pool array as a stack to serve the request. Now the overhead per block is reduced to a smaller value than in a separate block.

 

 

Another set of quick memory is a zero overhead quick memory used for small allocation sizes such as one or two words. This speeds up the allocations and also helps reducing overhead percentage of the entire system.

 

4. Packet buffer Management,

 

    TBD.

 

Not yet decided. To decide only after the driver architecture is decided with.

 

Magic Numbers:

All Magic values, especially the poison Magic are purposefully unaligned Long values so that they cannot be mistaken for a pointer by software that de-references it due to a software error, because this will crash the machine on an unaligned address.


 

Concurrency Management:

 

To manage re-entrance in code, we need to setup some method of synchronization. Generally all Os’ use a locking infrastructure to prevent simultaneous access of the same data structure by multiple threads of execution. 

 

The locking model of lattur requires being,

 

1.Scalable:

 

        This means that not only small data structures but also large data structures should be able to be locked. Locking should not work on a single structure or domain. It should be global (or local in some cases) where it may be operatable even outside of a single processor. Which means that even in case that there are many processors the locking scheme should work fine. Scalability also means that the complexity of locking should not increase with the number of processes that start to contend for the resource. As we may see latter locking in software using critical path codes would not scale well and that if the code has to be scalable in all senses then the code has to use hardware support.

 

2.Atomic:

 

There is no question on this account. If a locking scheme is not atomic then there is a problem that multiple processors trying to access the same resource could screw up the locking mechanisms by classical race conditions. Even in single processor there is a problem that there could be a race condition between different threads of execution. For example there could be a contention for a single resource in both the interrupt level and in process level, or even between two processes, which could cause race conditions if there is no guaranteed atomicity in the locking mechanisms. This makes us need an atomic primitive support for locking without race conditions.

 

There could be different strategies adopted for obtaining atomicity, this can be either hardware supported or software supported. But each method has its pros and cons.  For example software locking has the problem that no interrupts should occur in the time when the modification occurs. Other problems in software locking include the running of expensive algorithms which do not scale well when many process start to contend for the same resource. Asking a computer science graduate would always give the answer that an algorithm is always good. But then it is not true to the problems stated above. The locking scheme could also use the support from hardware to do atomic locking

 

 

 

 

3. Locking in the interrupt context:

 

Locking in the interrupt context has more significance and importance than in the process context. This is because in process context if the process is not able to obtain a lock, the process could be put to sleep or the scheduler may return to the process stating that the resource is busy or the lock is currently not available. The process in could handle this situation by either retrying after some time or, aborting the trials. This is dependent on the implementation of the process or the nature of the resource.

 

Handling locking in interrupt level is kind of hard, because in interrupt level, there cannot be retries. The interrupt needs to complete as soon as possible. So as far as possible locking in interrupts level should not be encouraged, unless or otherwise there is a necessity to access a shared resource. One of the common applications of interrupt level locking is when there is a typical producer consumer problem. This means that at the interrupt level some kind of information is pulled in from the hardware and is put in the form of a queue for consumption in a process level part. Now this queue is a shared resource and locking needs to be done to ensure there is no race condition. This is particularly tricky because that in interrupt level we need to complete before any process level code can run. Which implies that if in the interrupt we are going to wait for the lock to be removed then we are going o wait forever, if the lock is held in process level.

 

Backing off from an active lock is different in cases from interrupt level and process level. In interrupt level, there could not be a priority inversion problem. Cause the sharing of the resource is between the process level code and interrupt level code. So the interrupt needs to have some method of fallback processing, where if the interrupt level code is unable to obtain the lock it then adds it to another queue, which is maintained by it completely. Then the next time the interrupt occurs and gets the lock on the resource it adds all the elements it backed up into the queue. (Boy nice day)

 

Another situation is when interrupts of different levels share a resource, though there is no such situation that would require to have such a sharing, I just don't want to rule out the possibility that it could be possible. In this case the same backing up strategy would help.

 

The best of the possibilities is when the lock is contended by two tasks. This situation is interesting cause unlike interrupts the order of execution of tasks in an RTOS is of high importance. When the two tasks are of the same priority then the contention would be equally distributed cause the two tasks run at the same priority and can get to run at the same priority level of the processor according to a scheduling algorithm which schedules processes within the same priority. While waiting on a lock, the process can mention if it is okay to block, in which case, if the lock is unable to be obtained then the process is blocked (suspended) and other tasks are run according to the scheduling algorithm. Once the lock becomes ready the process is woke up and a signal is passed to it saying that the lock has been released.

 

Any number of processes can block on a lock, and all of them would be woke up and allowed to contend for the lock. In the above case only the highest priority process that woke up would get the lock, if there are more that one task of the same priority waiting on the same lock, then they contend in accordance with the scheduling algorithm that decides who runs first. In general, the task that runs first gets the lock (quite logical Huh!!)

 

In the case that the task indicates that it does not wish to block on the lock then, there is a BUSY flag returned to the process stating that the lock is busy and the kernel is unable to get it. The task could decide what to do next...

 

The interesting case is when there are two processes and they are of different priorities, and that too when the process that has the highest Priority is waiting for a lock that a low priority process is holding. In this case if we are to return busy and if the process is going to wait for the lock it is going to wait until the low priority process is going to work on the lock, further worse is when there is a process that is of priority in between the two, it is going to execute, blocking the low priority process from executing and which implies that the lock is never released and the high priority process is also going o wait (huh!! Not the first time).

 

This problem is called PRIORITY INVERSION, which means that the high priority task is now waiting as if it were at a lower priority than the lowest priority task. Which is totally unacceptable in an RTOS. In an RTOS a higher priority task is to be given the higher priority - in good sense, but I know it is a bad framing of the sentence :). So there should be some way that the higher priority task should get to execute. This could happen only if; the lower priority task executes and relieves the lock. Of course we could not abruptly pull away the lock from the low priority Task. This would cause a lot of race conditions by itself causing a solution to recurse into the same problem for which we started to think of a solution.

 

There is one well-known solution to this problem, we could just assign the low priority task the priority of the higher priority task temporarily and watch the lock on the resource. Now the low priority task would actually start to execute as a high priority task using all the privileges of that level. And actually when the task unlocks the lock, we know that we are unlocking a special lock, which we have noted down in the Priority inversion bit in the lock. So now we change the priority of the current task to the original priority and allow it to execute. The next timer interrupt would actually preempt the task running at low priority and the task with the highest priority then is allowed to execute. So this would prevent the higher priority task from being waiting forever for the lower priority task from executing.

 

But then there is a possibility that this lower priority task could always try to lock this recourse and try to always obtain a high priority. Task. And try to monopolize the lock. But again the design of the tasks lie only in the hands of the programmer, with good programming practice this kind of situation should be very rare.

 

4.Portable:

 

As any other kernel routines, this module should also be portable easily across many platforms. This means that we should follow the normal Kernel/Hal segregation in this module too.

 

 

Lattur implementation of the locking module:

 

Lattur depends upon the HAL to provide the primitive support for locking. This support provided may be from the hardware or from software. But it needs to be from the Hal. This would help because; the kernel code can be transparent to the implementation of the HAL code. And so portability is kind of easy.

 

 

At the lowest level there is a structure called the SCHEDLOCK. This structure contains only one integer, whose value needs to be 1 if locked or 0 if unlocked. This structure is supported by primitive routines from the hal and the Kernel.

 

In the hal there is a __hal_lock_sl and __hal_unlock_sl code that takes as argument this SCHEDLOCK structure and tries to atomically lock the structure. This means the hal tries to make the value of this structure from 0->1 provided that the initial value of the structure was 0. IF the structure was already locked, or for any reasons that the structure was locked while the HAL was trying to lock it, the HAL returns saying the lock was busy, it also does not try to change the value of the lock when it was unable to obtain the lock (sounds logical!!)

 

The OS provides an interface to the locking and unlocking of the sched lock. It is this functions that any other code should use and not the HAL's entry point.  The Hal may use any kind of algorithm inside to promise atomic locking of the lock. It may or may not use hardware support.

 

  This locking primitive is the unit over which any synchronization primitive has to be built. For the time being we have to complete the scheduler to think about the locking and synchronization primitives.

 


Lattur Interrupt Model and Interrupt Initialization

 

Interrupt handling in an RTOS is a very important part of it. Interrupts are events flagged by hardware to denote software of the happening of certain event external or internal to the system. The processor then stops the current execution sequence and starts fetching and executing instructions from a particular address called the Interrupt vector location. The kernel strategically puts interrupt-handling code into the processor’s interrupt vector locations. 

 

                                                                                                                                                                                                                       At any given time the processor could receive interrupts from any number of devices, prioritization of these interrupts are important because depending on the application some interrupts may be frequent and some more less frequent, in this case the less frequent interrupt has to be given precedence over the higher frequent interrupt. This prioritization could be done in hardware or software. Some processors have interrupts prioritized on their pins. Generally these processors have more than one pin assigned for interrupts and each pin is assigned a priority. 

 

This means that if the processor is handling an interrupt signaled on a pin of priority Y and an interrupt occurs at a pin of priority X higher than Y, the processor stops executing the current interrupt and starts executing from the interrupt vector location of the interrupt at point X.

 

                                                                                                                                                                                                                       In most general RTOS applications, people tend to use RISC processors for the simplicity and performance. In RISC processors such as MIPStm processors, to reduce hardware complexity, all interrupt pins are assigned the same priority, which means that even in an interrupt, under proper bit settings any other interrupt may be recognized and the processor would start executing the Interrupt vector of that interrupt.

 

                                                                                                                                                                                                                       This collision should be avoided in RTOS’ as said before because the interrupts are all not of the same frequency/ Importance. This can be done both in software as well as hardware. Processors may have an external Programmable Interrupt control, that sits between the interrupting devices and the processor and can be programmed with priority of the interrupt. This method could be fast but need extra hardware, which may be a preferred solution for system on chips, but for system on board applications, handling in software reduces hardware complexity.

 

                                                                                                                                                                                                                       This is the reason Lattur chooses to handle interrupt prioritization in software. As in any RISC processor lattur assumes all interrupts too as exceptions. The terms interrupts and exceptions will be used interchangeably to describe interrupts. They will be distinguished at the point where they should be.

 

                                                                                                                                                                                                                      

Lattur - Structured Exception Handling Model

 

Each exception level is described by an exception structure.

 

Interrupt levels

 

typedef enum xleveltype_ {

 NORMAL_LEVEL,

 IL_6,

 IL_5,

 IL_4,

 IL_3,

 IL_2.

 IL_1,

 IL_0,

 EXCEPTION_LEVEL

} xleveltype;

 

 

Normally the processor is in NORMAL_MODE, where it is executing a normal task thread. And can rise to higher levels such as IL_0 Through IL_1, on reception of an interrupt on that level. The highest level is Interrupt level 6 or IL_6.

 

 

Whenever there is an exception (internal processor error) all interrupts are blocked. On setting particular bits there can be a nested exception. But before putting the processor on that state the exception handler should save some machine state so that there could be no screw up due to nested exceptions.

 

 

These exceptions can be for the following reasons

1. TLB Miss

2. Break/Trap/Watch

3. TLB Invalid

4. TLB Modified

5. Arithmetic Exceptions

6. Bus errors

7. Address errors.

8. Cache error

9. Reset

10. NMI

 

and more depending on the processor architecture ....

As far as the TLB is considered we set it up early in the init period and should not miss, so no TLB miss is valid. It is considered to be fatal and the machine crashes.

 

So for all exceptions including the TLB exception lattur crashes.

 

For the break / trap / watch exceptions, if a GDB stub is in action it gets control else the machine crashes denoting the error.

 

Crashing in an exception the contents dumped are...

 

1. The exception.

2. The register context.

3. The dump of the stack of the current executing task.

4. Stack trace of the current executing task.

5. The exception level the processor is in

 

To do the crash dump, we save the current stack and the exception context and then call a kernel entry point to print the crash context and crash the kernel.

 

 

The exception context of lattur the is described by the following structure.

 

typedef struct xcontext_ {

    exceptiontype _xception;  // The exception that occurred

    ulong *mdc;                      // Machine independent structure

    ulong *stackptr;                // Stack pointer

    ulong *interrupted_pc;     // Interrupted pc

    ulong prev_exception_level; // Level Before current exception

} xcontext;

 

Exception routine type is defined as,

 

typedef (* xroutinetype)(xcontextype *context)

 

This structure is used to define an exception...

 

typedef struct exceptiontype_ {

    xleveltype level;               // Level this exception  will occur

    xroutinetype routine;      // The routine to be called when this exception occurs

    stacktype *stack;           // The stack to use when this exception occurs

    ulong xcount;                 // The number of times this exception / interrupt occured

    uchar *exceptionname; // The name of the exception, THis is normally used

                                          // when the machine crashes or reports an error.

} exceptiontype;

 

 

The normal processing of an exception would begin with a platform dependent processing, which is initiated by the processor.

 

Design for the platform part in MIPS platform.

 

Normally exception handling begins from a fixed entry point. The routine to dispatch an exception routine would be something like.

 

·          Get the exception structure for this exception level.

·          Install the level's stack and save the old stack.

·          Save machine state...

·          Save registers...

·          Disable Interrupts.

·          Enable exceptions (exceptions will be disabled by the processor when an exception is processed).

·          Call the exceptions entry point.

·          Disable Exceptions

·          Enable Interrupts

·          Restore the excepted state (the stack, machine state, registers).

·          Return to the exceptioned routine.

 

 

This is a bit different in case that the exception that occurred is an interrupt. This is because we may need to implement interrupt priorities in software.

 

 

 

 

 

 


Lattur Crash information requirements

 

Whenever lattur crashes, the reason due to crash and some debugging information is printed and logged so that the problem could be tracked down easily.

 

Crashes can be of two types.

 

1. Crash in NORMAL_LEVEL, this happens when the kernel finds out there was,

 

. Memory corruption

. FATAL_ERROR due to the call of a system API with an error

  Parameter.

. Some of the system parameters were corrupted / destroyed??

  When there is an error in this case the following information is dumped as crash information.

. The exact cause why we are crashing.

. Register values.

. Stack trace.

. Stack-Dump of the current stack.

. Current task control block.

. Dump of memory block in case there was a memory corruption..

 

 

   After dumping all this the crash routine the takes a trap with a code to return to rom-monitor indicating we crashed.

 

 

2. Crash in Exception level. This is caused because we made an exception, which is illegal, and decide to crash in the exception. Normally we do not intend to handle any bad exceptions.

 

Such as,

. Address error exceptions.

. Bus error exception

. Cache error Exception

. All TLB exceptions.

. Co-processor Unusable.

. Illegal Instruction

. Integer Overflow.

. Machine check

 

 

  The following context is dumped along with the crash information in case of a crash in an exception level...

 

                                                                                                                                                                                                                      

. Exception why we crashed.

. Register dump.

          . Stack trace of the current task.

. Stack dump of the current TASK

. Interrupt Level at which the crash happened. If this is

  higher than NORMAL_LEVEL then the dump of the particular

  level interrupt stack is also included.

. current task control block.                                                                                                                                                                           

 

 

   After doing this it returns to the rom-monitor.

 

The following routine is used to print the generic crash information...

 

void print_crash_info(xcontext *_xcontext) {

 

. Print the exception information

.. Exception Name

. Print the register information

.. General purpose registers

.. Cause

.. Badvaddr

.. Status

.. EPC

.. Error EPC

. Search and print the stack trace

.. print_stack_trace(Stack Pointer)

. dump_stack(Stack Pointer).

. If Last_level > Normal_level

   While Last_level > Normal_level

      dump_stack(last_level.stack)

 

. Print Current Task values()..

. Return To Rom-Mon


 

Lattur Event management Model:

 

The event management model should be simple and quick, cause the entire RTOS would be driven with the event model. All other synchronization and communication mechanisms are based on this model. So need to be quick in accessing the event queues.

 

Each event may be waited on by many tasks and one task may wait on multiple events. The Task may block on the events or continue execution. Then there is asynchronous notification to tell the task that the event has happened. Also the task should be able to mention if it wants to sleep in all events or any event.

 

So the design requirements are,

 

1. Each task should be capable on listening to many events.

2. Any event can have multiple tasks that could be sleeping on it.

3. It should have a policy to wakeup tasks, may be according to FCFS or priority or wake up all events.  This policy should also be encapsulated in the event.

4. Searching for a task in an event list or searching for an event_ in a sleep queue of a task should be fast enough for searching events.

 

The implementation.

 

1. There are two linked lists encapsulated in a single structure called the wait block. This block encapsulates a linked list for the event list of the event and a task sleep list for the task.

 

2. Each event list starts on an event. And a task list start in a task for a given event, there is exactly one wait block for the set {task, wait-block, event}

 

3. To walk thro' the task's list of wait blocks we need to lock the task's wait block lock, and to lock the wait block list of the event we need to block the event's wait block list. These two locks are needed because the two lists are independent and can be accessed independent of one another.

 

4. To unlink a block in a list we need to first lock the list either in the task or in the event depending on the requirement.

 

5. The even also encapsulates the wakeup policy, which determines the pattern of the wakeup of tasks, when more than one task is waiting on the wait list of an event.

 

6. For each event there are exactly two states. A trigger function that triggers the event, in which case the event is assumed to have happened and we wakeup the tasks according to the policy of the event. This may be an all wakeup policy or a single wakeup or priority based wakeup. After waking up the event returns to the old state where it waits for another trigger.

7. Other api's include

 

* add_task_to_event(event *event, tcb *tcbptr)

* remove_task_from_event(event *event, tcb *tcbptr)

* search_event_for_task(event *event, tcb *tcbptr)

* delete_event (event *evn)

* trigger (event *evn)

 

All other synchronization items and Inter-process communication items are also based on the events. This locking mechanism is important in all cases where there is a share of a resourse of many threads of execution.

 

 

Other synchronization primitives are just wrappers over the basic event model.

Kontact: