Announcement

Collapse
No announcement yet.

Data quality assessment

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data quality assessment

    I'd like to hack together some code to assess the quality and consistency of quality of received data, as opposed to the quantity, and I'm keen for community feedback about how to go about that.

    Aside from being useful for evaluating different antennae, this might be of some use to FR24 dev/ops, if they aren't already doing something like this because bad data are sometimes worse than no data.

    My receiver is self-assembled. It's a Raspberry Pi 3 running dump1090-mutability, so I get certain data that may or may not be available to other receivers, and I may be missing out on data that others get. I'm currently using a generic SDR with no preamp or bandpass, but a FA + LNA + BP is on its way to replace it. My antenna is a home-brew 8-segment co-lin, which does something like 3× to 5× better in terms of distance and message count than the 1/4w whip it replaced.

    Here's what I've got so far as indicators of poor signal quality:
    • Bad and unknown Mode-S messages as a proportion of received Mode-S preambles
    • Proportion of tracks that are single point
    • Signal strength, though I don't see this as being especially useful


    Bad Mode-S messages are, well, bad, probably too many bit errors. Unknown messages might be bad, but there's no way of knowing. My figures for each are about 60% and 39% respectively, so the vast majority of received messages are being thrown away.

    Single-point tracks are probably indicative of patchy coverage which, in turn, probably indicates poor quality data. My figure averages around 90% single-point.

    Those figures look a bit grim, which is why I want to drill into SQ a bit more. Despite the better antenna, it's mounted in a loft rather than outside but there's no immediate prospect of doing anything about that. The other possible explanation is that there's a cell base station maybe 100 yards away so GSM900 noise and perhaps 4G noise is probably adversely affecting the receiver. Maybe the bandpass filter will help there.

    All of the above depend on data provided by dump1090 and thus probably aren't portable.

    The idea I have in mind — and I'm interested to know if anybody has any others — is to look at the distribution of elapsed time between position reports for each a/c (ICAO address) on the basis that each a/c probably transmits position messages on a fairly periodic basis (active interrogation aside). A receiver getting good data should show a fairly narrow distribution of timings, where a receiver missing messages because of poor quality data will show a wider distribution. The quality, therefore, is inversely proportional to the standard deviation of that distribution.

    An alternative approach would be to look at the distribution of distance travelled between position reports, normalised for reported ground speed. The advantage of this is that the quality of the timestamps becomes irrelevant, a source of error that is probably not insignificant given the latency of rx to decode (where I assume the timestamps get recorded).

    A secondary benefit (to FR24) is that only lat/lon/alt/gs are required, not timestamps which means this processing can be done in FR24's infrastructure, if that is desirable.

    The disadvantage to this approach is that Vincentian (or even Haversine) computations are much more expensive than a simple timestamp substraction.

    What do you think?

    If others are interested in trying out this code I'll make it available, as well as the Munin plugins I wrote for dump1090-mutability (and compatible). I'm currently writing it in Python fed by SBS TCP port, so it should run on any platform including Windows. I'd be actively interested in other peoples' quality data for comparison purposes, especially those who get decent ranges or who are known to have good quality installs.

  • #2
    Originally posted by Strix technica View Post
    An alternative approach would be to look at the distribution of distance travelled between position reports, normalised for reported ground speed.

    [...]

    I'm currently writing it in Python fed by SBS TCP port, so it should run on any platform including Windows. I'd be actively interested in other peoples' quality data for comparison purposes, especially those who get decent ranges or who are known to have good quality installs.
    Update: Having written half of it, I now discover that position and velocity are transmitted in separate messages. This means that position and velocity are never known simultaneously, so there must be some degree of error — indeed, uncertainty — in such measurements. So wie Heisenberg, das gleich.

    In absolute terms, the error will be small, but so is the period/distance over which they relate.

    Again, it'd be fascinating to see what the message feed from a reliable source looks like. I don't suppose FR24 or any independent contributor would like to give me a sample of known-good data (SBS format, please)?

    Comment


    • #3
      Update 2: I've been running the alpha-quality code for a bit under a day. Early indications indicate average 5-minute mean intervals of ~40 seconds and average 5-minute mean displacement ratio† of about 7:1, suggesting that a lot of messages are still getting lost.

      The standard deviations, especially of message interval, are huge at times, but I'm beginning to think that significant message loss is going to be inevitable (so I'll have to think of a way to mitigate that or otherwise figure out how to turn the mean and s.d. figures into some measure of quality), for two reasons:

      1. Even a good receiver that can hear a/c 250 nm out will still inevitably experience lost messages that far out because distant rx is hard and prone to bit errors, and

      2. ADS-B transmits on a single frequency (unlike CDMA mobiles or trunked radios) and has nothing like CSMA/CD (Carrier Sense Multiple Access with Collision Detection, used on old-school Ethernet).

      Instead, a/c transmit† with a random interval of 400-600 ms with quite some significant power (75-500W). That means there are going to be a lot of collisions, particularly near congested airports and that's going to contribute to message loss.

      A Simulation of Signal Collisions over the North Atlantic for a Spaceborne ADS-B Receiver Using Aloha Protocol
      Last edited by Strix technica; 2017-05-04, 14:12. Reason: add PDF reference

      Comment


      • #4
        Code and instructions are here:



        ETA: I'd welcome it if others with known-good feeds (eg highly ranked on FR24 stats) ran the distribution plugin (or give me access to their dump1090 SBS port) to give me some idea of how my data compare with theirs.

        The way things are going and for the reasons described before and also on the GitHub page, distribution analysis is not going to be enough alone; I'm going to have to develop an index (similar to the stats score) for the results to be particularly meaningful.
        Last edited by Strix technica; 2017-05-04, 19:52.

        Comment


        • #5
          Mostly over my head.

          But I can potentially help with a southern hemisphere 30003 feed for a bit if required. Mainly traffic between 1800 and 1100 utc (if I calculated my difference right)
          Posts not to be taken as official support representation - Just a helpful uploader who tinkers

          Comment


          • #6
            Originally posted by Oblivian View Post
            But I can potentially help with a southern hemisphere 30003 feed for a bit if required. Mainly traffic between 1800 and 1100 utc (if I calculated my difference right)
            That'd be brilliant, thanks, and it'll also go some way toward getting some raw data toward JohnSunnyhills's point (who is also in NZ, though I don't know where) about lack of a/c activity at all at certain times of the day.

            Ideally, I'd like to collect 8 whole days of data, preferably 14, just to establish what weekly periodic cycles there may be — but I'll take what I can get. Thanks again!

            Based on your FR24 shared stats and assuming you're live 24x7, it seems you average 3.4 hours a day without any a/c. That's somewhat consistent with my own data from European space; things go quiet from around midnight for about 4–6 hours depending on where you measure it. I never get no data because of my location, but if the pattern is consistent then it might form the basis for some reasonable uptime compensation calculation.

            If you would ping me an email with the IP to connect to, I'll reply with the IP I'll be connecting from. If your dump1090 machine doesn't have a public IP, I can provide you with a tunnel to a public IP and show you how to set it up (again, by email).

            (As an aside, I knew the NZHN area pretty well, though not since I got interested in aviation and I still have relatives there. I made it down your way and to Otago the once, which was pretty awesome.)

            Comment


            • #7
              Strix
              I would like to utilise the tools that you made available in this link

              Originally posted by Strix technica View Post
              Code and instructions are here:



              ETA: I'd welcome it if others with known-good feeds (eg highly ranked on FR24 stats) ran the distribution plugin (or give me access to their dump1090 SBS port) to give me some idea of how my data compare with theirs.

              The way things are going and for the reasons described before and also on the GitHub page, distribution analysis is not going to be enough alone; I'm going to have to develop an index (similar to the stats score) for the results to be particularly meaningful.
              However, even though I have been in IT for 45+ years I would still appreciate some guidance to proceed.

              I would like to setup the ADSB monitoring tools on a RaspberryPi3 with standard setup using a NOOBS SD card.
              I simply downloaded the FR24feed software from the FR24 website and used that.

              Can I now use your tools? I don't have a clue whether I am running Dump1090 monitor or not.

              Can you please guide me through the installation instructions....

              Comment


              • #8
                Originally posted by JohnSunnyhills View Post
                I would like to utilise the tools that you made available in this link
                Great! Don't forget to do a periodic git pull (or download fresh zip) because this code will probably remain under active development for some time. I pushed some improvements to the aircraft count/distance graphs just yesterday.

                I am not 100% certain that my adsb_msg_dist plugin is bug-free. I'm seeing huge standard deviations on message intervals, much larger than I would have though likely. On the other hand, the standard deviation for normalised displacement ratio seems quite plausible. I haven't yet had a chance to drill into that to figure out why.

                Originally posted by JohnSunnyhills View Post
                However, even though I have been in IT for 45+ years I would still appreciate some guidance to proceed.

                I would like to setup the ADSB monitoring tools on a RaspberryPi3 with standard setup using a NOOBS SD card.
                I simply downloaded the FR24feed software from the FR24 website and used that.

                Can I now use your tools? I don't have a clue whether I am running Dump1090 monitor or not.

                Can you please guide me through the installation instructions....
                Assuming you have reasonable skills with the shell, sure, no problem. IT's a big field, and Unix system administration can be as specialised as any other branch. I'd be pretty green were I to administer a Windows AD, for example.


                Figuring out whether you're running dump1090

                I don't know whether the FR24feed image or software uses dump1090, but it should be fairly easy to find out.

                First point your web browser at http://raspberry.local:8754/settings.html (or whatever your pi is called). This is the configuration page for fr24feed and will tell you what fr24feed's source is. Mine says "AVR (TCP)". Any of the TCP options is a good hint that it's using dump1090. If it's using any of the USB options, your fr24feed is almost certainly doing its own decoding.

                In the latter case, there's no particular reason you shouldn't install dump1090 and reconfigure your fr24feed to use that instead, but it's extra work you may not be willing to do.

                Then see if there's a dump1090 web server already running. Try http://raspberry.local/ (again substituting your pi's address) and see what happens. It should be obvious whether what you get (if anything) is dump1090. If the result is negative, that doesn't mean you're not running dump1090, only that it isn't running a web interface on the default port.

                Shell into the pi and do
                Code:
                ps ax|grep dump1090
                If there's a dump1090 process running, you should see it in the output there. (NB: the grep command itself often shows up in the process list so take care not to confuse that with an active dump1090 process).


                Required configuration for dump1090

                Once we're reasonably sure that you've got some dump1090 instance running, we need to know a bit about how your dump1090 instance is configured.

                Code:
                ls /run/dump1090*.pid
                provided that shows some output, try the following, substituting /run/dump1090-mutability.pid as appropriate:

                Code:
                sudo cat /proc/$(cat /run/dump1090-mutability.pid)/cmdline|tr '\0' '\n'
                If you get a bunch of output, then you know there's a dump1090 instance running and you can see its configuration arguments. If you get no output, you may have to do some further digging to figure out what's going on (assuming fr24feed is not using a built-in decoder).

                My adsb_msg_dist, which is the plugin I'm most interested in getting feedback from, should run with any dump1090 because it relies on only the SBS output port. That'll either be specified on the dump1090 command line or the default (typically 30003) will be used.

                My dump1090_ plugin won't work with all dump1090s, only fairly recent ones that are configured to write out JSON data. Look in the output above for the argument --write-json. If it's present, then you should find a bunch of .json files in the directory named in the following argument.

                If it's not, then try
                Code:
                dump1090 --help
                look for --write-json in the output. The version and fork/variant of your dump1090 should be printed near the top right of the output, and it would be worth noting that. Likewise, the default SBS port will be mentioned and you'll need that if it's not specified on the command line.

                If --write-json isn't there, then you may need to add it. Look in /etc/default/dump1090-mutability or similar; in my copy, there's an option JSON_DIR. Ensure that's set (it's not by default in -mutability's fork).

                At the bottom of that same file, there is probably an EXTRA_ARGS that you can arbitrary extra arguments. Mine is set to:

                Code:
                EXTRA_ARGS="--phase-enhance --oversample --aggressive --mlat"
                Whether the first three will have any effect depends on the variant of the fork and how it was compiled. Aggressive mode is not enabled by default in the actual build, so you'd have to build from sources having edited the makefile.

                It seems that without --mlat, your instance is probably not taking part in MLAT calculations even if you have MLAT turned on in fr24feed. This is because, without this option, dump1090 doesn't output timestamps in AVR format which is the format fr24feed's configurator automatically selected.

                If you make any changes to this file, either reboot or (making the appropriate substitution for the service name)
                Code:
                sudo systemctl restart dump1090-mutability
                You don't have to have a running dump1090 webserver for my dump1090_ to work (although it's an excellent way of viewing what your particular instance is doing), you only need the JSON files the webserver would use if you want to use my dump1090_ plugin. Follow mutability's instructions (scroll down) for installing/configuring lighttpd if this is not already running and you want it.


                Installing my plugins

                You can either download my plugins as a zip file or use git. If you want to use git, do:

                Code:
                sudo aptitude install git
                cd /usr/local
                sudo git clone git@github.com:strix-technica/ADSB-tools.git .
                The first line will do nothing if git is already installed. To update, do
                Code:
                sudo git pull
                if you used git or download the zip and unpack it in /usr/local. Git's the easier way because it'll only download changes, if any.

                From there, the installation instructions for Munin and the plugins are layed out in my git repository (scroll down).

                That's probably all for now. Post any specific questions here and I'll try to help.

                Comment


                • #9
                  The default FR24 install uses a customised pull of Malcom Robbs version. Dump1090-MR with it's own parameters specified.

                  Recently the option to use other feeder dump1090 version streams was made (the datacapture shifted to a standard format and the error checks/filtering/bundling of the data passed to the FR24feed process instead I believe)

                  And then the ability to detect existing versions to ensure it didn't try and fire up a 2nd instance and over-ride users setup decisions to attempt to use the wrong option.

                  The issue lays when the device is NOT a DVB unit that requires Dump1090.

                  I run a test Pi for 'stuff' only now and then, my primary uploader is a Windows laptop with a mode-S beast and DPD high gain antenna. When I launch the feeder, it doesn't seem to spawn Dump1090 but uses the native uploader app itself with the original coding that the legacy uploaders (java and simple .exe) had to tap into their preferred receiver types before the SDRs come along.


                  Which raises the issues you briefed on earlier regarding MLAT and stats from it. The last we heard, beast MLAT was still a work in progress. And sure enough doesn't appear to occur during my upload process from what I can tell. The uploader then outputs BS 30003 & Raw pass-through that my other apps (planeplotter, MilmodeS, MYSQL logging etc) are tapping into downstream.

                  As for changing the data format out for it to Beast format instead of AVR for MLAT capture when using a DVB device, I've not fiddled to see if that is infact the case in the upload console window or everyone is making the assumption the beast format timetagging is passed binary and a more friendly one to use by the uploader they have written in the same way.

                  To date all the legacy how-to PDF and such all say use DVBT to launch the native Dump1090-MR(or one specified in the config page) on start, or AVRTCP data for Dump1090-MU and so on. With no apparent to the contrary that I could see publicly from Piotr or any other Devs to confirm the Binary mode is also able to be read
                  Last edited by Oblivian; 2017-05-09, 21:59.
                  Posts not to be taken as official support representation - Just a helpful uploader who tinkers

                  Comment


                  • #10
                    Originally posted by Oblivian View Post
                    The default FR24 install uses a customised pull of Malcom Robbs version.
                    That's slightly unfortunate; it would appear that dump1090-MR doesn't support JSON output, but anybody who wants to run my suite of plugins badly enough (email me if you want to see an example, I won't publish the link here) can switch to -mutability. Given what you say below, it should (almost) be a matter of dropping in -mutability as a direct replacement provided it's configured the same.

                    Who knows, -mutability might give better results given its oversampling and phase enhancement amongst other goodness, and people might improve their stats by switching (if they've got the Unix chops to do so). Certainly -mutability seems much better maintained; -MR hasn't been touched since 2014, where -mutability was last committed to in February. There might even be a case for FR24 to switch from -MR to -mutability, other than risk and inertia.

                    The message distribution plugin (the subject of this thread) only needs an SBS feed which, AFAIK, all dump1090 variants support so no problem there.

                    Originally posted by Oblivian View Post
                    And then the ability to detect existing versions to ensure it didn't try and fire up a 2nd instance and over-ride users setup decisions to attempt to use the wrong option.
                    As you say, the modern fr24feed plays nicely with pre-existing ADS-B decoders like -mutability. Installed the deb and it just worked, no fuss.

                    Originally posted by Oblivian View Post
                    Which raises the issues you briefed on earlier regarding MLAT and stats from it. The last we heard, beast MLAT was still a work in progress. And sure enough doesn't appear to occur during my upload process from what I can tell.
                    Interesting. MrMac seemed pretty confident that Beast was the preferred option but, being binary, it's not so easy to see exactly what it's doing without rummaging around in source code. I think I prefer AVR, at least you know what you're getting with that.

                    It also sounds like MLAT is not as advanced as many suppose it to be. The only obvious problem that I can see is that fr24feed has an MLAT option to turn on, but doesn't complain if it doesn't get the timestamps it requires to do MLAT (eg --mlat flag to dump1090). I filed an issue report (#161802, if you have access to the support system) concerning that.

                    Alas, there doesn't appear to be any way to positively tell that MLAT feed is working properly in a given instance. All I can say is that turning --mlat on in dump1090 didn't appear to break my existing feed even though it materially changed the AVR output stream.

                    We'll see what reply, if any, I get from FR24's team. It goes without saying that FR24's dev/ops teams are overloaded, as such teams always are. This might not be a high priority for them, especially if a substantial proportion of their feeder base uses the FR24 bundle.

                    Originally posted by Oblivian View Post
                    As for changing the data format out for it to Beast format instead of AVR for MLAT capture when using a DVB device, I've not fiddled to see if that is infact the case in the upload console window or everyone is making the assumption the beast format timetagging is passed binary and a more friendly one to use by the uploader they have written in the same way.
                    The conversions do not cost enough to be significant; strtoull() and friends are very standard libc calls. And yeah, assuming binary is better is unsafe.

                    What's this upload console window? The stdout output from fr24feed? If so, that probably correlates with /var/log/fr24feed.log. And no, there's nothing helpful in there in terms of MLAT (including lack of timestamp data which was the point of my support case).

                    Originally posted by Oblivian View Post
                    To date all the legacy how-to PDF and such all say use DVBT to launch the native Dump1090-MR(or one specified in the config page) on start, or AVRTCP data for Dump1090-MU and so on. With no apparent to the contrary that I could see publicly from Piotr or any other Devs to confirm the Binary mode is also able to be read
                    That being the case, then fr24feed absolutely depends on getting timestamp information from the decoder of choice, whichever variant of dump1090 is being used. For MLAT, that requires --mlat, at least with AVR. Who knows with binary Beast.

                    I should imagine that RTL-based radars are very common just because of their low cost. If so, then improperly configured dump1090 instances might be a priority for MLAT for the dev/ops team. Their call, of course.

                    Comment


                    • #11
                      I agree that it's far from obvious from the fr24feed bevhaviour if it works as configured or not. You need to check the log file to get some info.


                      Originally posted by Oblivian View Post
                      To date all the legacy how-to PDF and such all say use DVBT to launch the native Dump1090-MR(or one specified in the config page) on start, or AVRTCP data for Dump1090-MU and so on. With no apparent to the contrary that I could see publicly from Piotr or any other Devs to confirm the Binary mode is also able to be read
                      I have configured many feeders and all are using a standard approach:

                      dump1090-mutability producing Beast output on 30005
                      Fr24feed connects to localhost:30005 using beast-tcp

                      They all work fine and the log confirms that mlat data is generated.
                      The /etc/fr24feed.ini file should look like this;

                      receiver="beast-tcp"
                      fr24key="yourfeederkey"
                      host="localhost:30005"
                      bs="no"
                      raw="no"
                      logmode="2"
                      windowmode="0"
                      mpx="no"
                      mlat="yes"
                      mlat-without-gps="yes"


                      /M
                      F-ESDF1, F-ESGG1, F-ESGP1, F-ESNK1, F-ESNV2, F-ESNV3 F-ESSL4, F-ESNZ7, F-LFMN3
                      T-ESNL1, T-ESNL2, T-ESGR15
                      P-ESIA, P-ESIB, P-ESGF, P-ESSN, P-EFMA
                      mrmac (a) fastest.cc

                      Comment


                      • #12
                        Originally posted by MrMac View Post
                        I agree that it's far from obvious from the fr24feed bevhaviour if it works as configured or not. You need to check the log file to get some info.

                        They all work fine and the log confirms that mlat data is generated.
                        Can you show an example of what the log should look like when definitely feeding MLAT data to fr24? All I get is:

                        Code:
                        2017-05-10 10:27:09 | [mlat][i]Pinging the server
                        2017-05-10 10:27:09 | [mlat][i]Stats 16385891/0
                        2017-05-10 10:27:29 | [mlat][i]Received ADS-B time references AC:
                        2017-05-10 10:27:29 | [mlat][i] <ICAO Address>
                        2017-05-10 10:27:29 | [mlat][i] <ICAO Address>
                        ...
                        and similar. There's the occasional "No ADS-B time reference available (0/4)" but no positive indication of transmitting MLAT information. This is no different from before I added --mlat to dump1090.

                        Thanks,

                        Comment


                        • #13
                          Originally posted by Strix technica View Post
                          Can you show an example of what the log should look like when definitely feeding MLAT data to fr24? All I get is:

                          Code:
                          2017-05-10 10:27:09 | [mlat][i]Pinging the server
                          2017-05-10 10:27:09 | [mlat][i]Stats 16385891/0
                          2017-05-10 10:27:29 | [mlat][i]Received ADS-B time references AC:
                          2017-05-10 10:27:29 | [mlat][i] <ICAO Address>
                          2017-05-10 10:27:29 | [mlat][i] <ICAO Address>
                          ...
                          and similar. There's the occasional "No ADS-B time reference available (0/4)" but no positive indication of transmitting MLAT information. This is no different from before I added --mlat to dump1090.

                          Thanks,
                          The time reference is a good indicator. Usually errors and makes people with small range question it as it means theres some non ADSB extended squitters but no full ADSB reference to get the data it needs to send off.

                          Actually the MLAT output full stop does. I don't get that with a beast
                          Last edited by Oblivian; 2017-05-10, 10:06.
                          Posts not to be taken as official support representation - Just a helpful uploader who tinkers

                          Comment


                          • #14
                            Hah.

                            So Windows + beast + feeder via USB = no mlat

                            Pi + fr24feed windows raw out via TCP =

                            2017-05-10 22:55:08 | [mlat][i]Registering MLAT station
                            2017-05-10 22:55:08 | [mlat][i]Registering MLAT station: SUCCESS
                            2017-05-10 22:55:10 | [mlat][i]Received ADS-B time references AC:
                            2017-05-10 22:55:10 | [mlat][i] C822E9
                            2017-05-10 22:55:12 | [feed][i]sent 1,0 AC
                            2017-05-10 22:55:17 | [mlat][i]Pinging the server
                            2017-05-10 22:55:17 | [mlat][i]Stats 0/0
                            2017-05-10 22:55:18 | [feed][i]sent 1,0 AC

                            Go figure
                            Posts not to be taken as official support representation - Just a helpful uploader who tinkers

                            Comment


                            • #15
                              Interesting idea to check quality. I recently updated to an orange FA dongle from the standard TV DVB type. I got 50% more aircraft tracked and 3 times increase in position reports. I had to upgrade the Pi to a Pi3 as the single core CPU was overloaded given the increase to c 600,000 positions per day!

                              This got me thinking somewhat the way you have described. I have started to wonder how many "data packets" are lost because of collisions or are being drowned out by stronger transmissions. I am also wondering if a pair of antenna each covering 180 degrees would throw any light on this: if the merged data from 2 dongles and 2 antenna results in significantly more decoded data, that could also be a pointer to the issue. I don't know if anyone has tried that option. If you know can you post a link? We would only get the benefit of this analysis from a very busy site.

                              I will download and have a look at your code.

                              Roger

                              Comment

                              Working...
                              X