Web tracking investigation site

grahamn · 29 September 2020 04:10

There’s a new site at https://themarkup.org/blacklight which allows you to enter a URL and it will show all the trackers and other nasties on there. I’ve had a look through few of the more common news sites and it’s rather depressing the number and type of them on there. The sites in the Murdoch stable seem to be the worst as they include key loggers and screen recorders.

However, I found the CHOICE site to be unacceptable given their stated aim for consumers. I’m guessing the running of the site is mostly outsourced and therefore the editorial / reporting team may not be aware of what exactly is installed. I’d appreciate if it could be investigated and any clarifications given.

Gregr · 29 September 2020 04:45

Oops…I scanned themarkup.org and got patronizingly told off.

person · 29 September 2020 05:00

In what way?

They explicitly acknowledge that you scanned their own site but other than that what disturbs you?

person · 29 September 2020 05:06

There are two separate potential problems.

site operation outsourced
site built using off-the-shelf software

Either of those can introduce all sorts of trackers that Choice staff may not be aware of. Choice will have to speak to whether either or both of those apply.

choice.community is a separate web site and itself has a couple of tracker failures - but nowhere near as bad as choice.com.au

Gregr · 29 September 2020 05:09

Because the “search” came back straight away, no scan being done there, with a big message saying in essence, see, we are clean of all the things we check for. Not that I’m saying this isn’t a good tool to use, but really the nature of the Internet is the way it is. I even scanned VIC.GOV.AU and it came back with potential problems.

grahamn · 29 September 2020 05:21

The site seems to cache previous scans so the first time someone checks a URL it takes time. The next person gets the results immediately. I haven’t checked what cache expiry time they use.

The vic.gov.au site uses Google Analytics. That is used as a common means of seeing which parts of a site are the most popular. GA a decent tool and has the advantage of being free. The downside is that the information it produces is tracked by Google. There are other analytics tools you can install which have a greater level of privacy but the Google ones tend to be used because everyone else is already using them.

I’m not familiar with HotJar - the other tracker used in the Vic site - but I’m wary of any site that records keystrokes and mouse movement as that opens up too many possible security attack vectors.

person · 29 September 2020 05:33

Definitely.

I have occasionally stuffed up a “cut and paste” and ended up pasting confidential information into the wrong window and form e.g. pasting confidential information into this post.

If choice.community were using keystroke recording then that confidential information is potentially already sent away on the internet even if I immediately notice my mistake and delete the confidential information.

BrendanMays · 29 September 2020 06:32

Hi @grahamn,

Thanks for your questions about the CHOICE website.

Our site is managed in house, not outsourced, although our editorial and digital operations teams are distinct.

I realise that online privacy is an important issue, including for many users of this forum. That’s why we have this detailed privacy policy on our site to explain what types of data we collect and why. For example, like most sites, we use tools like website analytics that will show us non-identifiable data like traffic to our site, what articles are popular and so on.

There’s no recording of key logging in the way it’s being discussed here. Using the previous example, if you accidently copy and paste something into our site, this would not result in this information being stored and potentially used later for nefarious purposes. We do have a search function on our website, which I think in the past has copped some criticism for its functionality. I would need to check the finer points with our developers but they may look at some form of searches performed on our site to improve the functionality, but once again this would be deidentified or handled with appropriate care in the event someone had accidently searched their own personal info, much like other sensitive information we occasionally collect, such as payment information when people sign up for a membership.

It’s good to be aware and active regarding our online privacy, hopefully you’ll find many useful threads like this one that help. I would also encourage a careful approach when dealing with online privacy tools, some are good but others will use fear as a sales tool or (ironically) as a way to extract info. We have reviewed some of these on the choice.com.au site as well. If you don’t find what you need, please a start or thread or get back to me if you have further questions.

Gregr · 29 September 2020 08:44

Thanks for the response. There are lots of things going on behind the covers in Web sites that make the whole thing work better, help admins, and makes the user experience better. Getting all worked up about possible issues with what a Website uses and allows just unnecessarily scares people.

draughtrider · 29 September 2020 11:33

Re patronising - yes I was a little confused as well … the ‘result’ I got was (you might need to click this image to make it big enough to read):

I didn’t really feel ‘ised’ in any way, patron or otherwise …

… scanning localhost will nearly always be faster than anything else, with maybe some esoteric or unusual exceptions. I’d expect it to be essentially instant on any respectable compute platform.

Have you verified this against the logs of a webserver you have access to? That aside, the caching could also be external to the site performing the analysis - this would arguably be a little slack of them, but potentially out of their control depending on the structure of their webhost - if it were the case, a shrewd site admin would find another host, assuming the caching was happening close to home (for them). Similarly it is not inconceivable they have made a choice on this for economic reasons - if the cache timeout was hours, not days++ for example, then its probably a reasonably safe bet and only fails for some poor punter who is using it as a cleanliness test for a site they are developing themselves.

… which basically nails it from my perspective - capability does not prove intent. If a scrupulous and honourable entity abides by their policy, even if data is capable of being collected or is actually collected because the tools just ‘grab it all’, what matters is if and how it is stored and used. If the privacy policy is clear and the entity abides by it then all is well. Of course the entity could be hacked and maybe raw data more than is used could be leaked, and how many permutations of maybes of bad events cascading does one concern oneself about? there is a limit to what I worry about

It is an interesting site and tool - thank you for bringing it to community attention!

person · 30 September 2020 04:03

The problem with this argument is the use of third parties. Choice is a respected and trusted organisation but the multi-party web site that is choice.com.au is bringing in a whole host of additional organisations who have their own privacy policies and whose respect and trust level may well be lower (much lower). Many of them (all of them?) would also not be subject to Australian law.

A more controversial line of thinking is that by passively accepting this kind of web site design, Choice is endorsing it and supporting it (perhaps financially) regardless of whether these organisations ever breach anyone’s privacy specifically within or via the Choice web site. These organisations are creating a world of Surveillance Capitalism and, to the extent that that is a bad thing, Choice is complicit.

I suppose the ultimate test is … if I use my web browser to block all these Surveillance Capitalists, does the Choice web site still work? If “yes” then blocking is an option available to me - but many people will not know to do this, if they care. If “no” then Choice would be forcing me to be a participant in Surveillance Capitalism.

Gregr · 30 September 2020 04:16

Lets us know how your blocking goes, and I guess if you feel so strongly about what really happens in the WEB world you will not visit Choice again if blocking doesn’t fix your issue.

grahroll · 30 September 2020 07:02

Blocking still allows both Websites in my experience, I run NoScript, UBlock Origin and Privacy Badger among other more specific blockers for Facebook etc. The only one in NoScript that I have set to Trusted are for the Community “choice.community” and for the Choice.com.au is choice.com.au. Anything Google gets banned including the “fonts.googleapis.com”.

draughtrider · 30 September 2020 10:51

Yes, completely valid point … but this is ‘the internet’ - to me, it seems ‘abandon all hope, ye who enter here …’ - ie I’m not sure I should expect any policy or statement by anyone to protect me, it’s entirely up to me to be held responsible regardless (or irregardless) of how much I ‘can be’ in a practical sense. I ack this is an incredibly unhelpful perspective for a lot of people, especially those who are less computer savvy and/or believe there is hope in the privacy/etc domain - I do feel a lot of empathy toward people who really care about this and think privacy/secrecy/etc is actually still achievable - it should be, but my view is, it isn’t. That horse has not just bolted from the stable, it’s been grazing for years, recaptured, and now in a tube of glue …

grahroll · 30 September 2020 11:38

I find the blocking of some stuff eg through my Pi-Hole and via NoScript etc has a benefit in the amount of Ads I get to my email addresses and reduces the Ads I see on pages. I reduce what I send to the players and this reduces rubbish at my end. Secrecy and privacy as a whole is well and truly dead but as isolated reductions of “spam” it helps to be a bit more secure and a bit more private. So I agree we are using old hoof glue but I try to not let some new stuff be made.

person · 30 September 2020 23:23

Well that gave me a laugh anyway.

You may be right that privacy is dead on the internet but Choice still has a choice. Use, for example, Google Analytics or not.

I suspect that the problem here is that Choice does not in practice have a choice - because the web site is built using a bunch of off-the-shelf tools, and those tools mean that privacy-fail is built in from the ground up.

You will note the comment from themarkup.org about how difficult it was to build a privacy-safe web site.

I find that Notepad does the job - but you get really basic 1990s style web sites.

It’s not as “black and white” as that.

@grahamn has raised a valid issue - as well as drawing our attention to a useful tool.

On closer inspection, he is right. choice.com.au is swimming in third party trackers. On that basis, I would rate the site as: could do (much) better.

Whether they make the choice to do better is up to them - but there are plenty of shades of grey between the current situation and the ideal situation.

Do you know off the top of your head how Pi-Hole is doing blocking? DNS poisoning to non-existent IP address? DNS poisoning to itself and then rejecting all HTTP requests? Something else?

The absolute worst would be a site that records keystrokes in a password field - so that if you start to type the password for the wrong web site (or wrong thing generally), you are sending them the prefix of this other password even if you don’t click to Submit.

grahroll · 30 September 2020 23:42

A mix of Blacklists and Whitelists. The Pi-Hole in it’s default blocking mode sends the blacklisted domain requests to the “unspecified” address ie 0.0.0.0 (IPv4) and : : / 0:0:0:0:0:0:0:0 (IPv6)

There is a second mode you can use which I don’t use which only allows client requests over IPv4 and for IPv6 requests sends the result “NODATA-IPV6”. The blocked requests are answered with the Pi-Hole’s IPv4 address.

Blocklists are updated regularly and currently block way over 100,000 domains.

One thing I found very necessary on first setting up the Pi-Hole was to reduce the logging to the barest minimum otherwise the microSD card quickly becomes “dead” and I have directed the minimal log files to RAM. I don’t know if you @draughtrider have noticed/done the same?

draughtrider · 30 September 2020 23:57

Same. The detailed logging was interesting but not really necessary so I have minimal also on both pi-holes. I do have some extra disk on each (4 TB on one and 10 TB on the other) for fileshare, so I guess the logging could go there, but I don’t miss it.

Mine are reporting just over 130k blocked sites and about 13% of queries blocked …

person · 1 October 2020 00:00

This.

I don’t run Pi-Hole but I have other 24x7 Pi applications and, yes, minimal logging. Not only does this reduce wear on the µSD card but also I have found that the Pi does not handle power fail well (unless you have the Pi on a UPS to avoid that ‘ever’ occurring).

In some setups it may be OK or even advisable to direct logging to another host (for the above two reasons and for security reasons). That is a general comment however. I don’t know whether Pi-Hole even supports that.

grahroll · 1 October 2020 00:06

It’s easily doable for the logs, I redirected mine to RAM and yes my Pi sits on a UPS system that it shares with other hardware. The other Pi we have in our home is our Media station and has it’s own but much smaller UPS brick. We have upgraded to the Pi 4 B (8GB) units both for the RAM and the WiFi.

Join the conversation

Ask a question. Share tips. Help others.

Make yourself heard

Join our forum and be part of Australia’s biggest consumer movement.

Web tracking investigation site