Openreach FTTC and Wholesale ADSL use the same BT DLM system. Both share many of the same features, but there are 3 sub-systems for 20CN, 21CN & FTTC as each has slightly different profiles and parameters. The NGA FTTC system is operated by BT Openreach.
This tutorial looks at the DLM Function, focusing on which parameters are monitored, how it classifies a line and calculates if any changes needs to be made to the DLM profile.
~ DLM Introduction
BT Wholesale's Dynamic Line Management System is an extremely large topic. In an attempt to make it easier to digest, the subject has been split into several pages which you may find useful to aid understanding how the DLM works as a whole:
- The DLM System: - Looks at the hardware & software systems used to monitor broadband lines. Knowing what each device in the DLM system is responsible for and what it does helps to visualise DLM Management and the processes involved.
- The DLM Process: - Focuses on how the DLM system monitors a line and how it decides if any changes need to be made to your DLM profile. We describe in detail each process carried out by the DLM System, what algorithms are used and how the decision is made whether any changes need to be made to your DLM Profile.
- The DLM Profiles: - Although the DLM system & process is the similar regardless of product, there are some slight differences between what parameters the system can configure depending on whether you have ADSL1, ADSL2+ or VDSL2. The DLM profiles page breaks down the differences between the products and what configuration changes can be made for each type of xDSL. (Page not yet published.)
~ Monitoring the line
Over the course of a day information about the line will be recorded by the DSLAM's Data Collector. The daily data file monitoring period is split up into 96 x 15 minute bins and each of those bins contain the following information about the line:
- Data indicative of user activity.
- Based on traffic counts of both upstream & downstream traffic.
- A non-zero traffic count is taken as indication that the line has had user activity.
- Zero traffic count indicates the line has not been in use.
- Data indicative of instability
- One or more resynchronisations
- One or more errors caused by code violations - ES/SES
- Failed initialisations
- Connection Rate.
The Openreach monitoring period currently runs from 8pm to 8pm, although in 2019 a few lines may have been monitored using a temporal system where the monitoring period only runs during peak time. Any errors outside of that time are ignored. As at 2021 the bulk of lines are still using 8-8 system.
~ Stability Levels
There are three levels of stability profiles which may be applied to a line. The level of stability may have been chosen by the end-user, but usually the ISP will select a default profile on behalf of the user. This stability preference affects how the DLM will react to any period of instability on the line.
Stability Level |
WBC Profile (20/21CN) |
NGA Profile (FTTC) |
Description |
|
|
|
|
1. Aggressive |
Standard |
Speed |
Prioritise speed over stability for online gamers |
2. Normal |
Stable |
Standard |
Best overall balance between speed and stability |
3. Stable |
Super Stable |
Stable |
Prioritise stability over speed for IPTV |
|
Custom (SIN 472) |
|
Allows a CP to specify the thresholds which DLM will
manage the line towards. |
There is some confusion over the naming & mapping of profiles between BTw and BToR - the latter of which does not allow for interleaving to be turned off for FTTC, despite mention being made in various documents. |
- ISP's known to use NGA Speed Profile: AAISP, BT, Plusnet, Zen.
- ISP's known to use NGA Standard Profile: EE, Sky, TalkTalk, Vodafone.
~ Trigger Events
The DLM will monitor the line for any changes in stability. Earlier DLM systems relied purely upon sync events but circa 2010 the BT DLM system was amended to introduce a new method of detecting retrains and also included error detection as a trigger event.
The events used by the DLM system are:
- Total 24hr ES & SES
- Total 24hr Full Initialisations
- Total 24hr Failed Initialisations
- Total 24hr Uptime
- Total 24hr Unforced retrain count.
The events used by the RAP system are:
- Line rate in previous 24hr period
- Maximum line rate in previous 24hr period
- Minimum line rate in previous 24hr period
The upstream and downstream are monitored independently.
~ Detection of sync events
Whilst the DSLAM is capable of detecting loss of synchronisation and most modern routers are capable of sending a dying gasp message to indicate when loss of sync was through a power failure, BT's DLM system does NOT take any notice of dying gasp messages when it comes to counting retrain events†.
A retrain event is detected by "a RADIUS transaction having occurred" - ie a new authentication event has been recorded on to the BTw network, which is part of the handshake process of synchronisation.
The DLM only counts 'forced' retrains and will disregard any resyncs detected as being an Unforced Retrain or one caused by a Wide Area Event.
An unforced retrain is one in which the user switches off or unplugs their modem for "a period of time greater than the minimum period of time" and that a minimum period of time prior to or after a resynchronisation has elapsed without the line automatically attempting, but failing to establish a connection.
Because we know that the DLM collects data bins every 15 mins and that it monitors traffic count to see if the line is in use, it is therefore recommended if possible to try to leave the router switched off for 30 mins to ensure that the DLM sees at least one complete period of inactivity prior to the resync.
Algorithm: If a resync is detected in bin 'x' and bin ('x'-1) has > 0 seconds uptime then the DLM will class it as an unforced retrain. If (retrain count in bin x > 0) && (uptime in bin (x-1) == 0) then it is assumed that retrain was caused by a user event and disregarded.
† Note on Dying Gasp - Whilst DLM may not make use of the dying gasp message, nor is it mandatory for MCT; modem manufacturers are encouraged to implement it's use for Openreach's Test and Diagnostic systems. This allows ISPs to check EUs have performed a power cycle of the modem prior to a potential engineer visit. See SIN 498 Section 3.2.5 R.OAM.4.
~ Detection of Errors
The DLSAM's Data Collector records the amount of coding violations (errors) seen on your line, these figures are also displayed by some modem routers.
The type of coding violation that the DLM is interested in are Errored Seconds. The DLM then normalises any errors to the total uptime in order to even out any burst periods:
- Mean Time Between Errors (MTBE) = Uptime / Errored Second Count.
The MTBE measures Errored Seconds and SES only (Not HEC, CRC or FEC).
Note. Whilst the DSLAM system may record CRCs and FECs for other OSS purposes, there is only one code violation parameter recorded by the Element Manager used for the MTBE calculation. There are instances whereby if a line is performing particularly poorly, then RAMBO will undertake additional line monitoring direct with the DLSAM. In such cases it monitors additional parameters such as SNR Margin which are not normally monitored at all for most lines. For more information see: DLM System - Additional Line Monitoring.
~ Data Analysis [Step 1]
Each of the 96 bins are checked to see if there has been user activity and marked active or dormant.
Any instability during inactive or dormant periods is ignored as the end-user will have been unlikely to have been affected by this.
Uptime is calculated from the active bins and any data indicative of instability during these periods is normalised.
Errors and resyncs are normalised to the uptime. This is calculated
by dividing the total time in seconds which the
respective line has been in synchronisation and in active
use over the past 24 hour period of the monitoring by the
number of re-trains or errors recorded in that period. The two algorithms used are:
- MTBE (Mean Time Between Errors) = Connection uptime / Code Violations (Errors)
- MTBR (Mean Time Between Retrains) = Connection uptime / No of retrains
This step is done by the element manager with data obtained from the Data Collectors and the information passed to the Management Device RAMBo each day.
As well as MTBE & MTBR line data, the element manager also produces an event data file which is used to monitor for Wide Area Events and forced retrains. This event data is recorded as an array of each 15 min period in binary format [Uptime, Retrains, Errors]. For example a line which has uptime and errors but no retrains will record [1,0,1].
~ Preparing to Categorise the Line [Step 2]
1. Check for Wide Area Events
Each day, the DLM Management Device receives sets of data from the DSLAM's element manager. First it will analyse the event data from all lines to check for events such as thunderstorms which may have caused multiple lines to resync and/or generate lots of errors.
If a pre-determined percentage of lines experience retrains and errors in the the same time frame then any events occurring in that time frame will be classed as a Wide Area Event.
Documentation would suggest that the percentage values for wide area events are: >20% of users with uptime experienced a resync OR >50% of users with uptime experiencing errors && >10% of users with uptime experienced a resync.
So attempting to put it in simple terms, if data in the binary file in any of the 15 min bins at the same time frame meets any one of the following two criteria:
- > 20% of bins are [1,1,1] OR [1,1,0]
- > 50% of bins are [1,0,1] AND >10% of bins are [1,1,1]||[1,1,0]
then a wide area event is declared for that period. Data from any bin within the corresponding time frame is discarded and not used for the DLM calculation.
2. Check for Unforced Retrains
An unforced retrain is when the End User has turned off or power downed the modem. BT does not use the dying gasp, instead preferring to assume that an unforced retrain has occurred when the modem has remained powered down for 'x' period of time.
To check if the modem has remained powered down, it can use information from the event data file. If it detects that a line has retrained from any particular bin, then the preceding bin is checked to see if 0 was recorded for uptime.
If the preceding bin had 0 uptime then it is assumed that the retrain was an unforced event and will be discounted by the DLM calculation.
3. Get Stability Level.
The Service Provider is identified and the Level of Stability selected for the line is obtained.
~ Categorising the Line - ILQ Indicative Line Quality. [Step 3]
Using the MTBE & MTBR data, the Management Device will categorise the line using the relevant stability level metrics. Either one of MTBE or BTBR data can trigger the DLM to apply a (further) step to increase line stability.
Below are tables showing the MTBR and MTBE thresholds for each Stability Option*
WBC ADSL/ADSL2+ Line Categorisation Thresholds |
Stability Option |
MTBR red threshold |
MTBR green threshold |
MTBE red threshold |
MTBE green threshold |
Standard |
8,640 |
16,800 |
5 |
250 |
Stable |
16,800 |
33,600 |
300 |
6,000 |
Super Stable |
33,600 |
67,200 |
3,600 |
60,000 |
wef Apr 2014
NGA FTTC Line Categorisation Thresholds |
Stability Option |
Retrain threshold |
MTBR green threshold |
MTBE red threshold |
MTBE green threshold |
Speed |
20 |
8400 |
30 |
300 |
Standard |
10 |
16800 |
180 |
600 |
Stable |
5 |
33600 |
360 |
3600 |
wef Jun 2012
MTBR update 5/15
MTBE update 6/21
Using the above thresholds there are are four possible categories for which the line may be classified as:- Very Poor, Poor, OK and Good.
Example
Standard DLM Algorithm for line categorisation |
Stability |
Metric |
Good |
|
per
day |
OK
|
|
per
day |
Poor |
Very Poor |
GEA
Speed
|
Retrains |
mtbr≥8400 |
|
10 |
>4200 && <8400 |
|
20 |
mtbr<4200 |
>10 per hour |
Errors |
mtbe≥300 |
|
288 |
>30 && <300 |
|
2880 |
mtbe<30 |
|
GEA
Standard |
Retrains |
mtbr≥16800 |
|
5 |
>8400 && <16800 |
|
10 |
mtbr<8400 |
>10 per hour |
Errors |
mtbe≥600 |
|
144 |
>180 && <600 |
|
480 |
mtbe<180 |
|
GEA
Stable |
Retrains |
mtbr≥33600 |
|
2 |
>16800 && <33600 |
|
5 |
mtbr<16800 |
|
Errors |
mtbe≥3600 |
|
24 |
>360 && <3600 |
|
240 |
mtbe<360 |
|
BTw Aggressive
(Standard) |
Retrains |
mtbr≥16800 |
4.66 hr |
5 |
>8640 && <16800 |
2.4hr |
10 |
mtb< 8640 |
>10 per hour |
Errors |
mtbe≥250 |
4.16 m |
345 |
>5 && <250 |
5 s |
17280 |
mtb<5 |
|
BTw Normal
(Stable)
|
Retrains |
mtbr≥33600 |
9.33 hr |
2.5 |
>16800 && <33600 |
4.66hr |
5 |
mtb<16800 |
>10 per hour |
Errors |
mtbe≥6000 |
1.66 hr |
14 |
>300 && <6000 |
5 m |
288 |
mtb<300 |
|
BTw Stable
(Super Stable) |
Retrains |
mtbr≥67200 |
18.6 hr |
1 |
>33600 && <67200 |
9.33 hr |
2.5 |
mtb<33600 |
>10 per hour |
Errors |
mtbe≥60000 |
16.6 hr |
1.4 |
>3600 && <60000 |
1 hr |
24 |
mtb<3600 |
|
Two examples:
1). If a line is operating at Standard Stability and the average time between retrains over the day is less than once every 8640 seconds (2.4 hrs) - which equates to more than 10 per day OR if the average time between errors is less than 1 per 3 seconds of active uptime - (>17280 per day) then the line would be classified as poor.
2). If a line is operating at Standard Stability and the average time between retrains over the day is more than once every 16800 seconds (4.6 hrs) - which equates to more than 4 per day AND if the average time between errors is more than 1 per 250 seconds of active uptime - (<345 per day) then the line would be classified as good.
When the line category has been obtained, then DLM system will move on to the next step to check if any changes to the DLM profile needs to be made.
*These figures are provided in good faith and may not necessarily be the most up to date.
~ Making Changes to the DLM Profile.
When a line has been categorised, the Management device checks to see if any changes to the DLM profile needs to be applied.
DLM Action Status |
Line Classification |
ILQ Status |
Action |
|
|
|
Good - Performing beyond expectations |
Green |
Check if can remove/reduce any of the DLM parameters. |
Performing within acceptable params |
Amber |
No changes will be made to the DLM profile |
Poor MBTR/MTBE |
Red |
The system will apply a further DLM step to increase stability. |
Poor MBTE upstream |
Crimson |
The system will apply a further DLM step to increase stability. |
Rapid Retrains |
Scarlet |
The system will undertake additional line monitoring so that immediate changes to profiles may be made. |
No DLM data |
Grey |
No action. |
Insufficient DLM data |
Black |
Days uptime was less than 15 mins. No action. |
Up until this point, the DLM process for ADSL1, ADSL2+ and FTTC are very similar. Any changes the DLM system makes now depends on the product type as each of these have different parameters which may be adjusted.
The individual product parameters will be discussed in more depth on a separate page but a summary is shown below:
Parameters which may be adjusted by the DLM |
|
SNR Margin |
Interleaving |
INP |
Capping/Banding |
|
|
|
|
|
ADSL1 |
Yes - 6-15 dB |
ON/OFF |
NO |
Extreme circumstances |
|
Example profile: on 9 6 off |
ADSL2+ |
Yes - 3-15 dB |
OFF/Low/Med delay |
INP - 0/1/2 |
Yes. UC = Uncapped |
|
WBC 160K - 24M Medium delay (INP 1) 15dB Downstream, UC Medium delay (INP 2) 6dB Upstream (ADSL2+) |
FTTC |
No - Fixed 6dB |
OFF/Low/High |
G.INP - |
Yes |
|
0.128M-10M Downstream, Retransmission Low - 0.128M-1.3M Upstream, Error Protection Off |
~ Removal of DLM intervention - Reversal of Interleaving & Error Protection steps.
Unfortunately very little is known about this part of the process. What we do know is that the line must be acheiving ILQ green status for 'x' period of time. The period of time varies and it is deliberate to ensure that a line doesn't flap between profiles. There has been mention of a 'doubler' method and although this would also make sense from what we have observed, there is no hard evidence that this is fact.
DLM is usually quite forgiving for a first time intervention and the line will go down one step after a full day of stability. This fact has been bourne out by many users on our forums many times over. Ive even experienced it myself first hand several times.
- Case One: Testing a new router for a manufacturer. Firmware had a bug with the bitswap process and caused the line to have interleaving applied. Router was swapped out and interleaving removed after full day of MTBE green.
- Case Two: Day 1 - Line fault caused massive amount of Errored Seconds. Day 2 - DLM applied interleaving, but high Err/Secs still continued. Day 3 - DLM applied INP, Err/Secs continue. Day 4 - DLM applied more interleaving. Err Secs continued to exceed MTBE red. Day5 -DLM increased INP but fault found at remote location and was fixed. Day 6 - Line stable no ErrSecs. Day 7 - DLM reduces INP. Day 8 - DLM reduces amount of interleaving. Day 8 - DLM totally removes INP. Day 9 - DLM totally removes interleaving.
It would appear to use some sort of doubler method for each indidual intervention, so the more times a line sees DLM action, then the longer it takes for it to be removed.
- Example: Day 1 - Line ILQ exceeds MTBE red. Day 2 - DLM applies interleaving. Day 3 - No errors MTBE Green. Day 4. DLM removes interleaving, but line immediately starts to see errors and goes MTBE red. Day 4 - DLM reapplies interleaving. ILQ status Amber. Day 5 - ILQ status green. Day 6 - DLM takes no action and waits further period. It is at this stage where additional line monitoring is possibly performed to ensure other line parameters such as SNRm is reasonably stable before making the decission to remove interleaving.
~ DLM reset
With adsl/adsl2+ products, it is possible for the ISP to reset the DLM.
For NGA products (Fibre) then the ISP cannot perform a reset and this can only be done by a BT Openreach engineer after clearance of a line fault.
Update 2019 - ISP can request DLM reset from Openreach without having to call out an engineer but only if DLM appears stuck and the linehas been deemed stable for a suitable period of time.
~ Note by author
This page has been compiled after months of research and countless hours reading all available information about BT's DLM.
Things ground to a halt just prior to the ASSIA court case and since then it has been nigh on impossible to get any new information about the BT DLM or changes made since that date. The ASSIA court case appears to center around BT's ILQ system and the process steps and decisions undertaken to change and reverse DLM steps, which is why there is little information about this stage.
I had hoped that in time, new information would come to light, but 8 months later still nothing. Although this page has been here for a while pending an update, it has been requested several times that I publish what information I do have. All pieces are not there when it comes to the ILQ, but afaik the data analysis steps still remain exactly the same and hence publication now. If more information does ever become available then I will update.
©kitz 2014
last update Jun 2015
Updated 6/21. New DLM params
|