• homebrewer an hour ago

    This is a good time to mention that dnsmasq lets you setup several DNS servers, and can race them. The first responder wins. You won't ever notice one of the services being down:

      all-servers
      server=8.8.8.8
      server=9.9.9.9
      server=1.1.1.1
    • anthonyryan1 42 minutes ago

      Additionally, as long as you don't set strict-order, dnsmasq will automatically use all-servers for retries.

      If you were using systemd-resolved however, it retries all servers in the order they were specified, so it's important to interleave upstreams.

      Using the servers in the above example, and assuming IPv4 + IPv6:

          1.1.1.1
          2001:4860:4860::8888
          9.9.9.9
          2606:4700:4700::1111
          8.8.8.8
          2620:fe::fe
          1.0.0.1
          2001:4860:4860::8844
          149.112.112.112
          2606:4700:4700::1001
          8.8.4.4
          2620:fe::9
      
      will failover faster and more successfully on systemd-resoved, rather than if you specify all Cloudflare IPs together, then all Google IPs, etc.

      Also note that Quad9 is default filtering on this IP while the other two or not, so you could get intermittent differences in resolution behavior. If this is a problem, don't mix filtered and unfiltered resolvers. You definitely shouldn't mix DNSSEC validatng and not DNSSEC validating resolvers if you care about that (all of the above are DNSSEC validating).

      • mnordhoff 44 minutes ago

        Even without "all-servers", DNSMasq will race servers frequently (after 20 seconds, unless it's changed), and when retrying. A sudden outage should only affect you for a few seconds, if at all.

      • v5v3 5 hours ago

        > For many users, not being able to resolve names using the 1.1.1.1 Resolver meant that basically all Internet services were unavailable.

        Don't you normally have 2 DnS servers listed on any device. So was the second also down, if not why didn't it go to that.

        • rom1v 5 hours ago

          On Android, in Settings, Network & internet, Private DNS, you can only provide one in "Private DNS provider hostname" (AFAIK).

          Btw, I really don't understand why it does not accept an IP (1.1.1.1), so you have to give an address (one.one.one.one). It would be more sensible to configure a DNS server from an IP rather than from an address to be resolved by a DNS server :/

          • quacksilver 5 hours ago

            Private DNS on Android refers to 'DNS over HTTPS' and would normally only accept a hostname.

            Normal DNS can normally be changed in your connection settings for a given connection on most flavours of Android.

            • fs111 2 hours ago

              No, it is not DNS over HTTPS it is DNS over TLS, which is different.

              • lxgr an hour ago

                Android 11 and newer support both DoH and DoT.

              • eptcyka 4 hours ago

                Cloudflare has valid certs for 1.1.1.1

                • quaintdev 5 hours ago

                  Its DNS over TLS. Android does not support DNS over HTTPS except Google's DNS

                  • KoolKat23 an hour ago

                    As far as I understand it, it's Google or Cloudflare?

                    • lxgr an hour ago

                      It does since Android 11.

                    • rom1v 5 hours ago

                      > Private DNS on Android refers to 'DNS over HTTPS'

                      Yes, sorry, I did not mention it.

                      So if you want to use DNS over HTTPS on Android, it is not possible to provide a fallback.

                      • ignoramous 2 hours ago

                        > So if you want to use DNS over HTTPS on Android, it is not possible to provide a fallback.

                        Not true. If the (DoH) host has multiple A/AAAA records (multiple IPs), any decent DoH client would retry its requests over multiple or all of those IPs.

                        • lxgr an hour ago

                          Does Cloudflare offer any hostname that also resolves to a different organization’s resolver (which must also have a TLS certificate for the Cloudflare hostname or DoH clients won’t be able to connect)?

                          • ignoramous an hour ago

                            Usually, for plain old DNS, primary and secondary resolvers are from the same provider, serving from distinct IPs.

                            • lxgr an hour ago

                              Yes, but you were talking about DoH. I don’t know how that could plausibly work.

                              • ignoramous 36 minutes ago

                                > but you were talking about DoH

                                DoH hosts can resolve to multiple IPs (and even different IPs for different clients)?

                                Also see TFA

                                  It's worth noting that DoH (DNS-over-HTTPS) traffic remained relatively stable as most DoH users use the domain cloudflare-dns.com, configured manually or through their browser, to access the public DNS resolver, rather than by IP address. DoH remained available and traffic was mostly unaffected as cloudflare-dns.com uses a different set of IP addresses.
                  • Macha 3 hours ago

                    Cloudflare's own suggested config is to use their backup server 1.0.0.1 as the secondary DNS, which was also affected by this incident.

                    • stingraycharles 3 hours ago

                      TBH at this point the failure modes in which 1.1.1.1 would go down and 1.0.0.1 would not are not that many. At CloudFlare’s scale, it’s hardly believable a single of these DNS servers would go down, and it’s rather a large-scale system failure.

                      But I understand why Cloudflare can’t just say “use 8.8.8.8 as your backup”.

                      • bombcar an hour ago

                        At least some machines/routers do NOT have a primary and backup but instead randomly round-robin between them.

                        Which means that you’d be on cloudflare half the time and on google half the time which may not be what you wanted.

                    • sschueller 3 hours ago

                      Yes, I would also highly recommend using a DNS closest to you (for those that have ISPs that don't mess around (blocking etc.) with their DNS you usually get much better response times) and multiple from different providers.

                      If your device doesn't support proper failover use a local DNS forwarder on your router or an external one.

                      In Switzerland I would use Init7 (isp that doesn't filter) -> quad9 (unfiltered Version) -> eu dns0 (unfiltered Version)

                      • Gieron 5 hours ago

                        I think normally you pair 1.1.1.1 with 1.0.0.1 and, if I understand this correctly, both were down.

                        • moontear 4 hours ago

                          Just pair 1.1.1.1 with 9.9.9.9 (Quad9) so you have fault tolerance in terms of provider as well.

                          • rvnx 3 hours ago

                            Quad9 is reselling the traffic logs, so it means if you connect to secret hosts (like for your work), they will be leaked

                            • daneel_w an hour ago

                              Could you show a citation? Your statement completely opposes Quad9's official information as published on quad9.net, and what's more it doesn't align at all with Bill Woodcock's known advocacy for privacy.

                              • Demiurge an hour ago

                                Is this true? They claim that they don't keep any logs. Do you have a source?

                              • baobabKoodaa 2 hours ago

                                Windows 11 does not allow using this combination

                                • antonvs 15 minutes ago

                                  You can use it, you just need to set the DNS over HTTPS templates correctly, since there's an issue with the defaults it tries to use when mixing providers.

                                  The templates you need are:

                                  1.1.1.1: https://cloudflare-dns.com/dns-query

                                  9.9.9.9: https://dns.quad9.net/dns-query

                                  8.8.8.8: https://dns.google/dns-query

                                  See https://learn.microsoft.com/en-us/windows-server/networking/... for info on how to set the templates.

                                  • lxgr an hour ago

                                    How so? Does it reject a secondary DNS server that’s not in the same subnet or something similar?

                                    • antonvs 14 minutes ago

                                      It's using DNS over HTTPS, and it doesn't default the URL templates correctly when mixing (some) providers. You can set them manually though, and it works.

                                    • snickerdoodle12 an hour ago

                                      Huh? Did they break the primary/secondary DNS server setup that has been present in all operating systems for decades?

                                      • antonvs 13 minutes ago

                                        DNS over HTTPS adds a requirement for an additional field - a URL template - and Windows doesn't handle defaulting that correctly in all cases. If you set them manually it works fine.

                                        • snickerdoodle12 8 minutes ago

                                          What does that have to do with plain old dns?

                                  • Algent 4 hours ago

                                    Yeah pretty much. In a perfect world you would pair it with another service I guess but usually you use the official backup IP because it's not supposed to break at same time.

                                    • carlhjerpe 4 hours ago

                                      I would rather fall back to the slow path of resolving through root servers than fall back from one recursive resolver to another.

                                    • rvnx 3 hours ago

                                      8.8.8.8 + 1.1.1.1 is stable and mostly safe

                                      • baobabKoodaa 2 hours ago

                                        Windows 11 does not allow using this combination

                                    • Bluescreenbuddy an hour ago

                                      Yup. I have Cloudfare and Quad9

                                      • zamadatix 5 hours ago

                                        1.1.1.1 is also what they call the resolver service as a whole, the impact section (seems to) be saying both 1.0.0.0/24 and 1.1.1.0/24 were affected (among other ranges).

                                        • bmicraft 4 hours ago

                                          My Mikrotik router (and afaict all of them) don't support more than one DoH address.

                                          • ahoka 3 hours ago

                                            Or run your own, if you are able to.

                                            • rat9988 5 hours ago

                                              Not all users have configured two DNS servers?

                                              • quacksilver 5 hours ago

                                                It is highly recommended to configure two or more DNS servers incase one is down.

                                                I would count not configuring at least two as 'user error'. Many systems require you to enter a primary and alternate server in order to save a configuration.

                                                • tgv 3 hours ago

                                                  The default setting on most computers seems to be: use the (wifi) router. I suppose telcos like that because it keeps the number of DNS requests down. So I wouldn't necessarily see it as user error.

                                                • daneel_w an hour ago

                                                  OK. But there's no reason or excuse not to, if they already manually configured a primary.

                                              • jallmann 8 hours ago

                                                Good writeup.

                                                > It’s worth noting that DoH (DNS-over-HTTPS) traffic remained relatively stable as most DoH users use the domain cloudflare-dns.com, configured manually or through their browser, to access the public DNS resolver, rather than by IP address.

                                                Interesting, I was affected by this yesterday. My router (supposedly) had Cloudflare DoH enabled but nothing would resolve. Changing the DNS server to 8.8.8.8 fixed the issues.

                                                • bauruine 8 hours ago

                                                  How does DoH work? Somehow you need to know the IP of cloudflare-dns.com first. Maybe your router uses 1.1.1.1 for this.

                                                  • maxloh 5 hours ago

                                                    Yeah, your operating system will first need to resolve cloudflare-dns.com. This initial resolution will likely occur unencrypted via the network's default DNS. Only then will your system query the resolved address for its DoH requests.

                                                    Note that this introduces one query overhead per DNS request if the previous cache has expired. For this reason, I've been using https://1.1.1.1/dns-query instead.

                                                    In theory, this should eliminate that overhead. Your operating system can validate the IP address of the DNS response by using the Subject Alternative Name (SAN) field within the CA certificate presented by the DoH server: https://g.co/gemini/share/40af4514cb6e

                                                    • stingraycharles 5 hours ago

                                                      Yeah I don’t understand this part either, maybe it’s supposed to be bootstrapped using your ISP’s DNS server?

                                                      • tom1337 5 hours ago

                                                        Pretty much that. You set up a bootstrap DNS server (could be your ISPs or any other server) which then resolves the IP of the DoH server which then can be used for all future requests.

                                                      • stavros 7 hours ago

                                                        Are we meant to use a domain? I've always just used the IP.

                                                        • landgenoot 5 hours ago

                                                          You need a domain in order to get the s in https to work

                                                          • bigiain 5 hours ago

                                                            That's not correct.

                                                            LetEncrypt are trialling ip address https/TLS certificates right now:

                                                            https://letsencrypt.org/2025/07/01/issuing-our-first-ip-addr...

                                                            They say:

                                                            "In principle, there’s no reason that a certificate couldn’t be issued for an IP address rather than a domain name, and in fact the technical and policy standards for certificates have always allowed this, with a handful of certificate authorities offering this service on a small scale."

                                                            • noduerme 5 hours ago

                                                              right, this was announced about two weeks ago to some fanfare. So in principle there was no reason not to do it two decades ago? It would've been nice back then. I never heard of any certificate authority offering that.

                                                              • bombcar an hour ago

                                                                It the beginning of HTTPS you were supposed to look for the padlock to prove if was a safe site. Scammers wouldn’t take the time and money to get a cert, after all!

                                                                So certs were often tied with identity which an IP really isn’t so few providers offered them.

                                                                • fs111 an hour ago

                                                                  > I never heard of any certificate authority offering that.

                                                                  DigiCert does. That is where 1.1.1.1 and 9.9.9.9 get their valid certificates from

                                                              • yread 5 hours ago

                                                                what about certificate for IP address?

                                                                • landgenoot 5 hours ago

                                                                  What about a route that gets hijacked? There is no HSTS for IP addresses.

                                                                  • sathackr 4 hours ago

                                                                    Presumably the route hijacker wouldn't have a valid private key for the certificate so they wouldn't pass validation

                                                                • maxloh 5 hours ago

                                                                  Nope. That is not correct. https://1.1.1.1/dns-query is a perfectly valid DoH resolver address I've been using for months.

                                                                  Your operating system can validate the IP address of the DNS response by using the Subject Alternative Name (SAN) field within the CA certificate presented by the DoH server: https://g.co/gemini/share/40af4514cb6e

                                                                  • federiconafria 5 hours ago

                                                                    What about a reverse DNS lookup?

                                                                • ta1243 7 hours ago

                                                                  And even if you have already resolved it the TTL is only 5 minutes

                                                                • noduerme 5 hours ago

                                                                  Funny. I was configuring a new domain today, and for about 20 minutes I could only reach it through Firefox on one laptop. Google's DNS tools showed it active. SSH to an Amazon server that could resolve it. My local network had no idea of it. Flush cache and all. Turns out I had that one FF browser set up to use Cloudflare's DoH.

                                                                  • sneak 6 hours ago

                                                                    I disagree. The actual root cause here is shrouded in jargon that even experienced admins such as myself have to struggle to parse.

                                                                    It’s corporate newspeak. “legacy” isn’t a clear term, it’s used to abstract and obfuscate.

                                                                    > Legacy components do not leverage a gradual, staged deployment methodology. Cloudflare will deprecate these systems which enables modern progressive and health mediated deployment processes to provide earlier indication in a staged manner and rollback accordingly.

                                                                    I know what this means, but there’s absolutely no reason for it to be written in this inscrutable corporatese.

                                                                    • stingraycharles 5 hours ago

                                                                      I disagree, the target audience is also going to be less technical people, and the gist is clear to everyone: they just deploy this config from 0 to 100% to production, without feature gates or rollback. And they made changes to the config that wasn’t deployed for weeks until some other change was made, which also smells like a process error.

                                                                      I will not say whether or not it’s acceptable for a company of their size and maturity, but it’s definitely not hidden in corporate lingo.

                                                                      I do believe they could have elaborate more on the follow up steps they will take to prevent this from happening again, I don’t think staggered roll outs are the only answer to this, they’re just a safety net.

                                                                      • willejs 5 hours ago

                                                                        If you carry on reading, its quite obvious they misconfigured a service and routed production traffic to that instead of the correct service, and the system used to do that was built in 2018 and is considered legacy (probably because you can easily deploy bad configs). Given that, I wouldn't say the summary is "inscrutable corporatese" whatever that is.

                                                                        • bigiain 5 hours ago

                                                                          I agree it's not "inscrutable corporatese"

                                                                          It's carefully written so my boss's boss thinks he understands it, and that we cannot possibly have that problem because we obviously don't have any "legacy components" because we are "modern and progressive".

                                                                          It is, in my opinion, closer to "intentionally misleading corporatese".

                                                                          • noduerme 5 hours ago

                                                                            Joe Shmo committed the wrong config file to production. Innocent mistake. Sally caught it in 30 seconds. We were back up inside 2 minutes. Sent Joe to the margarita shop to recover his shattered nerves. Kid deserves a raise. Etc.

                                                                            • sathackr 3 hours ago

                                                                              Yea the "timeline" indicating impact start/end is entirely false when you look at the traffic graph shared later in the post.

                                                                              Or they have a different definition of impact than I do

                                                                      • sathackr 3 hours ago

                                                                        Good writeup except the entirely false timeline shared at the beginning of the post

                                                                        • bartvk 3 hours ago

                                                                          You need to clarify such a statement, in my opinion.

                                                                        • Hamuko 6 hours ago

                                                                          My (Unifi) router is set to automatic DoH, and I think that means it's using Cloudflare and Google. Didn't notice any disruptions so either the Cloudflare DoH kept working or it used the Google one while it was down.

                                                                      • chrismorgan 7 hours ago

                                                                        I’m surprised at the delay in impact detection: it took their internal health service more than five minutes to notice (or at least alert) that their main protocol’s traffic had abruptly dropped to around 10% of expected and was staying there. Without ever having been involved in monitoring at that kind of scale, I’d have pictured alarms firing for something that extreme within a minute. I’m curious for description of how and why that might be, and whether it’s reasonable or surprising to professionals in that space too.

                                                                        • perlgeek 6 hours ago

                                                                          There's a constant tension between speed of detection and false positive rates.

                                                                          Traditional monitoring systems like Nagios and Icinga have settings where they only open events/alerts if a check failed three times in a row, because spurious failed checks are quite common.

                                                                          If you spam your operators with lots of alerts for monitoring checks that fix themselves, you stress the unnecessarily and create alert blindness, because the first reaction will be "let's wait if it fixes itself".

                                                                          I've never operated a service with as much exposure as CF's DNS service, but I'm not really surprised that it took 8 minutes to get a reliable detection.

                                                                          • sbergot 5 hours ago

                                                                            I work on the SSO stack in a b2b company with about 200k monthly active users. One blind spot in our monitoring is when an error occurs on the client's identity provider because of a problem on our side. The service is unusable and we don't have any error logs to raise an alert. We tried to setup an alert based on expected vs actual traffic but we concluded that it would create more problems for the reason you provided.

                                                                            • grinich 32 minutes ago

                                                                              This is off topic, but I’m the founder of WorkOS and we love hiring people with your experience. (WorkOS powers SSO for OpenAI, Anthropic, Cursor, etc.)

                                                                              Send me an email if you’re ever looking for a new job? mg@workos.com

                                                                            • chrismorgan 4 hours ago

                                                                              At Cloudflare’s scale on 1.1.1.1, I’d imagine you could do something comparatively simple like track ten-minute and ten-second rolling averages (I know, I know, I make that sound much easier and more practical than it actually would be), and if they differ by more than 50%, sound the alarm. (Maybe the exact numbers would need to be tweaked, e.g. 20 seconds or 80%, but it’s the idea.)

                                                                              Were it much less than 1.1.1.1 itself, taking longer than a minute to alarm probably wouldn’t surprise me, but this is 1.1.1.1, they’re dealing with vasts amounts of probably fairly consistent traffic.

                                                                              • perlgeek an hour ago

                                                                                I'm sure some engineer at cloudflare is evaluating something like this right now, and try it on historical data how many false positives that would've generated in the past, if any.

                                                                                Thing is, it's probably still some engineering effort, and most orgs only really improve their monitoring after it turned out to be sub-optimal.

                                                                                • chrismorgan 29 minutes ago

                                                                                  This is hardly the first 1.1.1.1 outage. It’s also probably about the first external monitoring behaviour I imagine you’d come up with. That’s why I’m surprised—more surprised the longer I think about it, actually; more than five minutes is a really long delay to notice such a fundamental breakage.

                                                                                • briangriffinfan an hour ago

                                                                                  I would want to make sure we avoid "We should always do the exact specific thing that would have prevented this exact specific issue"-style thinking.

                                                                              • bombcar an hour ago

                                                                                This is one of those graphs that would have been on the giant wall in the NOC in the old days - someone would glance up and see it had dropped and say “that’s not right” and start scrambling.

                                                                                • TheDong 6 hours ago

                                                                                  I'm not surprised.

                                                                                  Let's say you've got a metric aggregation service, and that service crashes.

                                                                                  What does that result in? Metrics get delayed until your orchestration system redeploys that service elsewhere, which looks like a 100% drop in metrics.

                                                                                  Most orchestration take a sec to redeploy in this case, assuming that it could be a temporary outage of the node (like a network blip of some sort).

                                                                                  Sooo, if you alert after just a minute, you end up with people getting woken up at 2am for nothing.

                                                                                  What happens if you keep waking up people at 2am for something that auto-resolves in 5 minutes? People quit, or eventually adjust the alert to 5 minutes.

                                                                                  I know you often can differentiate no data and real drops, but the overall point, of "if you page people constantly, people will quit" I think is the important one. If people keep getting paged for too tight alarms, the alarms can and should be loosened... and that's one way you end up at 5 minutes.

                                                                                  • croemer 4 hours ago

                                                                                    It's not rocket science. You do a 2 stage thing: Why not check if the aggregation service has crashed before firing the alarm if it's within the first 5 minutes? How many types of false positives can there be? You just need to eliminate the most common ones and you gradually end up with fewer of them.

                                                                                    Before you fire a quick alarm, check that the node is up, check that the service is up etc.

                                                                                    • mentalgear 6 hours ago

                                                                                      Its not wrong for smaller companies. But there's an argument that a big system critical company/provider like Cloudflare should be able to afford its own always on team with a night shift.

                                                                                      • misiek08 6 hours ago

                                                                                        Please don’t. It doesn’t make sense, doesn’t help, doesn’t improve anything and is just waste of money, time, power and people.

                                                                                        Now without crying: I saw multiple, big companies getting rid of NOC and replacing that with on duties in multiple, focused teams. Instead of 12 people sitting 24/7 in group of 4 and doing some basic analysis and steps before calling others - you page correct people in 3-5 minutes, with exact and specific alert.

                                                                                        Incident resolution times went greatly down (2-10x times - depends on company), people don’t have to sit overnight and sleep for most of the time and no stupid actions like service restart taken to slow down incident resolution.

                                                                                        And I’m not liking that some platforms hire 1500 people for job that could be done with 50-100, but in terms of incident response - if you already have teams with separated responsibilities then NOC it’s "legacy"

                                                                                        • immibis 2 hours ago

                                                                                          24/7 on-call is basically mandatory at any major network, which cloudflare is. Your contractual relations with other networks will require it.

                                                                                        • amelius 5 hours ago

                                                                                          I think it is reasonable if the alarm trigger time is, say 5-10% of the time required to fix most problems.

                                                                                          • amelius 2 hours ago

                                                                                            Instead of downvoting me, I'd like to know why this is not reasonable?

                                                                                          • chrismorgan 6 hours ago

                                                                                            Not even a night shift, just normal working hours in another part of the world.

                                                                                            • bigiain 5 hours ago

                                                                                              There are kinds big step/jumps as the size of a company goes up.

                                                                                              Step 1: You start out with the founders being on call 27x7x365 or people in the first 10 or 20 hires "carry the pager" on weekends and evenings and your entire company is doing unpaid rostered on call.

                                                                                              Step 2: You steal all the underwear.

                                                                                              Step 3: You have follow-the-sun office-hours support staff teams distributed around the globe with sufficient coverage for vacations and unexpected illness or resignations.

                                                                                              • chrismorgan 4 hours ago

                                                                                                I confess myself bemused by your Step 2.

                                                                                                • bigiain 4 hours ago

                                                                                                  I'm like, come on! It's a South Park reference? Surely everybody here gets that???

                                                                                                  <google google google>

                                                                                                  "Original air date: December 16, 1998"

                                                                                                  Oh, right. Half of you weren't even born... Now I feel ooooooold.

                                                                                        • philipwhiuk 4 hours ago

                                                                                          Remember they have no SLA for this service.

                                                                                          • chrismorgan 3 hours ago

                                                                                            So?

                                                                                            They have a rather significant vested interest in it being reliable.

                                                                                        • alyandon 27 minutes ago

                                                                                            Cloudflare's 1.1.1.1 Resolver service became unavailable to the Internet starting at 21:52 UTC and ending at 22:54 UTC
                                                                                          
                                                                                          Weird. According to my own telemetry from multiple networks they were unavailable for a lot longer than that.
                                                                                          • kachapopopow 4 hours ago

                                                                                            Interesting to see that they probably lost 20% of 1.1.1.1 usage from a roughly 20 minute incident.

                                                                                            Not sure how cloudflare keeps struggling with issues like these, this isn't the first (and probably won't be the last) time they have these 'simple', 'deprecated', 'legacy' issues occuring.

                                                                                            8.8.8.8+8.8.4.4 hasn't had a global(1) second of downtime for almost a decade.

                                                                                            1: localized issues did exist, but that's really the fault of the internet and they did remain running when google itself suffered severe downtime in various different services.

                                                                                            • Tepix 4 hours ago

                                                                                              There's more to DNS than just availability (granted, it's very important). There's also speed and privacy.

                                                                                              European users might prefer one of the alternatives listed at https://european-alternatives.eu/category/public-dns over US corporations subject to the CLOUD act.

                                                                                              • adornKey 37 minutes ago

                                                                                                I think just setting up Unbound is even less trouble. Servers come and go. Getting rid of the dependency altogether is better than having to worry who operates the DNS-servers and how long it's going to be available.

                                                                                                • daneel_w an hour ago

                                                                                                  Everyone, European or not, should prefer anything but Cloudflare and Google if they feel that privacy has any value.

                                                                                                  • immibis 2 hours ago

                                                                                                    HN users might prefer to run their own. It's a low maintenance service. It's not like running a mail server.

                                                                                                    • daneel_w an hour ago

                                                                                                      I think that might be overestimating the technical prowess of HN readers on the whole. Sure, it doesn't require wizardry to set up e.g. Unbound as a catch-all DoT forwarder, but it's not the click'n'play most people require. It should be compared to just changing the system resolvers to dns0, Quad9 etc.

                                                                                                • perlgeek 5 hours ago

                                                                                                  An outage of roughly 1 hour is 0.13% of a month or 0.0114% of a year.

                                                                                                  It would be interesting to see the service level objective (SLO) that cloudflare internally has for this service.

                                                                                                  I've found https://www.cloudflare.com/r2-service-level-agreement/ but this seems to be for payed services, so this outage would put July in the "< 99.9% but >= 99.0%" bucket, so you'd get a 10% refund for the month if you payed for it.

                                                                                                  • philipwhiuk 4 hours ago

                                                                                                    Probably 99.9% or better annually just from a 'maintaining reputation for reliability' standpoint.

                                                                                                    • stingraycharles 2 hours ago

                                                                                                      What really matters with these percentages is whether it’s per month or per year. 99.9% per year allows for much longer outages than 99.9% per month.

                                                                                                  • CuteDepravity 8 hours ago

                                                                                                    It's crazy that both 1.1.1.1 and 1.0.0.1 where affected by the same change

                                                                                                    I guess now we should start using a completely different provider as dns backup Maybe 8.8.8.8 or 9.9.9.9

                                                                                                    • sammy2255 8 hours ago

                                                                                                      1.1.1.1 and 1.0.0.1 are served by the same service. It's not advertised as a redundant fully separate backup or anything like that...

                                                                                                      • yjftsjthsd-h 7 hours ago

                                                                                                        Wait, then why does 1.0.0.1 exist? I'll grant I've never seen it advertised/documented as a backup, but I just assumed it must be because why else would you have two? (Given that 1.1.1.1 already isn't actually a single point, so I wouldn't think you need a second IP for load balancing reasons.)

                                                                                                        • kalmar 7 hours ago

                                                                                                          I don't know of it's the reason, but inet_aton[0] and other parsing libraries that match its behaviour will parse 1.1 as 1.0.0.1. I use `ping 1.1` as a quick connectivity test.

                                                                                                          [0] https://man7.org/linux/man-pages/man3/inet_aton.3.html#DESCR...

                                                                                                          • tom1337 5 hours ago

                                                                                                            Wasn’t it also because a lot of hotel / public routers used 1.1.1.1 for captive portals and therefore you couldn’t use 1.1.1.1?

                                                                                                            • immibis 2 hours ago

                                                                                                              Because operating systems have two boxes for DNS server IP addresses, and Cloudflare wants to be in both positions.

                                                                                                              • ta1243 7 hours ago

                                                                                                                Far quicker to type ping 1.1 than ping 1.1.1.1

                                                                                                                1.0.0.0/24 is a different network than 1.1.1.0/24 too, so can be hosted elsewhere. Indeed right now 1.1.1.1 from my laptop goes via 141.101.71.63 and 1.0.0.1 via 141.101.71.121, which are both hosts on the same LINX/LON1 peer but presumably from different routers, so there is some resilience there.

                                                                                                                Given DNS is about the easiest thing to avoid a single point of failure on I'm not sure why you would put all your eggs in a single company, but that seems to be the modern internet - centralisation over resilience because resilience is somehow deemed to be hard.

                                                                                                                • yjftsjthsd-h 7 hours ago

                                                                                                                  > Far quicker to type ping 1.1 than ping 1.1.1.1

                                                                                                                  I guess. I wouldn't have thought it worthwhile for 4 chars, but yes.

                                                                                                                  > 1.0.0.0/24 is a different network than 1.1.1.0/24 too, so can be hosted elsewhere.

                                                                                                                  I thought anycast gave them that on a single IP, though perhaps this is even more resilient?

                                                                                                                  • darkwater 7 hours ago

                                                                                                                    Not a network expert but anycast will give you different routes depending on where you are. But having 2 IPs will give you different routes to them from the same location. In this case since the error was BGP related, and they clearly use the same system to announce both IPs, both were affected.

                                                                                                            • 0xbadcafebee 6 hours ago

                                                                                                              In general, the idea of DNS's design is to use the DNS resolver closest to you, rather than the one run by the largest company.

                                                                                                              That said, it's a good idea to specifically pick multiple resolvers in different regions, on different backbones, using different providers, and not use an Anycast address, because Anycast can get a little weird. However, this can lead to hard-to-troubleshoot issues, because DNS doesn't always behave the way you expect.

                                                                                                              • ben0x539 6 hours ago

                                                                                                                Isn't the largest company most likely to have the DNS resolver closest to me?

                                                                                                                • sschueller 3 hours ago

                                                                                                                  No, your ISP can have a server closer before any external one.

                                                                                                                  • fragmede 6 hours ago

                                                                                                                    Your ISP should have a DNS revolver closer to you. "Should" doesn't necessarily mean faster, however.

                                                                                                                    • lxgr an hour ago

                                                                                                                      I’ve had ISPs with a DNS server (configured via DHCP) farther away than 1.1.1.1 and 8.8.8.8.

                                                                                                                  • dontTREATonme 6 hours ago

                                                                                                                    What’s your recommendation for finding the dns resolver closest to me? I currently use 1.1 and 8.8, but I’m absolutely open to alternatives.

                                                                                                                    • LeoPanthera 5 hours ago

                                                                                                                      The closest DNS resolver to you is the one run by your ISP.

                                                                                                                      • JdeBP an hour ago

                                                                                                                        Actually, it's about 20cm from my left elbow, which is physically several orders of magnitude closer than anything run by my ISP, and logically at least 2 network hops closer.

                                                                                                                        And the closest resolving proxy DNS server for most of my machines is listening on their loopback interface. The closest such machine happens to be about 1m away, so is beaten out of first place by centimetres. (-:

                                                                                                                        It's a shame that Microsoft arbitrarily ties such functionality to the Server flavour of Windows, and does not supply it on the Workstation flavour, but other operating systems are not so artificially limited or helpless; and even novice users on such systems can get a working proxy DNS server out of the box that their sysops don't actually have to touch.

                                                                                                                        The idea that one has to rely upon an ISP, or even upon CloudFlare and Google and Quad9, for this stuff is a bit of a marketing tale that is put about by thse self-same ISPs and CloudFlare and Google and Quad9. Not relying upon them is not actually limited to people who are skilled in system operation, i.e. who they are; but rather merely limited by what people run: black box "smart" tellies and whatnot, and the Workstation flavour of Microsoft Windows. Even for such machines, there's the option of a decent quality router/gateway or simply a small box providing proxy DNS on the LAN.

                                                                                                                        In my case, said small box is roughly the size of my hand and is smaller than my mass-market SOHO router/gateway. (-:

                                                                                                                        • lxgr an hour ago

                                                                                                                          Is that really a win in terms of latency, considering that the chance of a cache hit increases with the number of users?

                                                                                                                      • baobabKoodaa 2 hours ago

                                                                                                                        Windows 11 doesn't allow using that combination

                                                                                                                    • codingminds 8 hours ago

                                                                                                                      Wasn't that the case since ever?

                                                                                                                      • bigiain 5 hours ago

                                                                                                                        I mean, aren't we already?

                                                                                                                        My Pi-holes both use OpenDNS, Quad9, and CloudFlare for upstream.

                                                                                                                        Most of my devices use both of my Pi-holes.

                                                                                                                        • globular-toast 7 hours ago

                                                                                                                          In general there's no such thing as "DNS backup". Most clients just arbitrarily pick one from the list, they don't fall back to the other one in case of failure or anything. So if one went down you'd still find many requests timing out.

                                                                                                                          • JdeBP an hour ago

                                                                                                                            The reality is that it's rather complicated to say what "most clients" do, as there is some behavioural variation amongst the DNS client libraries when they are configured with multiple IP addresses to contact. So whilst it's true to say that fallback and redundancy does not always operate as one might suppose at the DNS client level, it is untrue to go to the opposite extreme and say that there's no such thing at all.

                                                                                                                        • nu11ptr an hour ago

                                                                                                                          Question: Years ago, back when I used to do networking, Cisco Wireless controllers used 1.1.1.1 internally. They seemed to literally blackhole any comms to that IP in my testing. I assume they changed this when 1.0.0.0/8 started routing on the Internet?

                                                                                                                          • blurrybird an hour ago

                                                                                                                            Yeah part of the reason why APNIC granted Cloudflare access to those very lucrative IPs is to observe the misconfiguration volume.

                                                                                                                            The theory is CF had the capacity to soak up the junk traffic without negatively impacting their network.

                                                                                                                            • yabones 34 minutes ago

                                                                                                                              The general guidance for networking has been to only use IPs and domains that you actually control... But even 5-8 years ago, the last time I personally touched a cisco WLC box, it still had 1.1.1.1 hardcoded. Cisco loves to break their own rules...

                                                                                                                            • Mindless2112 8 hours ago

                                                                                                                              Interesting that traffic didn't return to completely normal levels after the incident.

                                                                                                                              I recently started using the "luci-app-https-dns-proxy" package on OpenWrt, which is preconfigured to use both Cloudflare and Google DNS, and since DoH was mostly unaffected, I didn't notice an outage. (Though if DoH had been affected, it presumably would have failed over to Google DNS anyway.)

                                                                                                                              • caconym_ 8 hours ago

                                                                                                                                > Interesting that traffic didn't return to completely normal levels after the incident.

                                                                                                                                Anecdotally, I figured out their DNS was broken before it hit their status page and switched my upstream DNS over to Google. Haven't gotten around to switching back yet.

                                                                                                                                • radicaldreamer 8 hours ago

                                                                                                                                  What would be a good reason to switch back from Google DNS?

                                                                                                                                  • Algent 4 hours ago

                                                                                                                                    After trying both several time I since stayed with google due to cloudflare always returning really bad IPs for anything involving CDN. Having users complain stuff take age to load because you got matched to an IP on opposite side of planet is a bit problematic especially when it rarely happen on other dns providers. Maybe there is a way to fix this but I admit I went for the easier option of going back to good old 8.8.8.8

                                                                                                                                    • homebrewer an hour ago

                                                                                                                                      No, it's deliberately not implemented:

                                                                                                                                      https://developers.cloudflare.com/1.1.1.1/faq/#does-1111-sen...

                                                                                                                                      I've also changed to 9.9.9.9 and 8.8.8.8 after using 1.1.1.1 for several years because connectivity here is not very good, and being connected to the wrong data center means RTT in excess of 300 ms. Makes the web very sluggish.

                                                                                                                                    • sammy2255 8 hours ago

                                                                                                                                      Depends who you trust more with your DNS traffic. I know who I trust more.

                                                                                                                                      • nojs 6 hours ago

                                                                                                                                        Who? Honest question

                                                                                                                                        • Elucalidavah 6 hours ago

                                                                                                                                          Realistically, either you ignore the privacy concerns and set up routing to multiple providers preferring the fastest, or you go all-in on privacy and route DNS over Tor over bridge.

                                                                                                                                          Although, perhaps, having an external VPS with a dns proxy could be a good middle ground?

                                                                                                                                          • daneel_w an hour ago

                                                                                                                                            If you're the technical type you can run Unbound locally (even on Windows) and let it forward queries with DoT. No need for neither Tor nor running your own external resolver.

                                                                                                                                            • Tijdreiziger 3 hours ago

                                                                                                                                              Middle ground is ISP DNS, right?

                                                                                                                                            • daneel_w an hour ago

                                                                                                                                              Quad9, dns0.

                                                                                                                                              • immibis 2 hours ago

                                                                                                                                                Myself, I suppose? Recursive resolvers are low-maintenance, and you get less exposure to ISP censorship (which "developed" countries also do).

                                                                                                                                                • misiek08 6 hours ago

                                                                                                                                                  Google is serving you ads, CF isn’t.

                                                                                                                                                  And it’s not conspiracy theory - it was very suspicious when we did some testing on small, aware group. The traffic didn’t look like being handled anonymously at Google side

                                                                                                                                                  • mnordhoff 6 hours ago

                                                                                                                                                    Unless the privacy policy changed recently, Google shouldn't be doing anything nefarious with 8.8.8.8 DNS queries.

                                                                                                                                                    • DarkCrusader2 5 hours ago

                                                                                                                                                      They weren't supposed to do anything with our gmail data as well. That didn't stop them.

                                                                                                                                                      • Tijdreiziger 3 hours ago

                                                                                                                                                        [citation needed]

                                                                                                                                          • anon7000 8 hours ago

                                                                                                                                            They go into that more towards the end, sounds like some smaller % of servers needed more direct intervention

                                                                                                                                            • motorest 7 hours ago

                                                                                                                                              > Interesting that traffic didn't return to completely normal levels after the incident.

                                                                                                                                              Clients cache DNS resolutions to avoid having to do that request each time they send a request. It's plausible that some clients held on to their cache for a significant period.

                                                                                                                                            • 0xbadcafebee 6 hours ago

                                                                                                                                              > A configuration change was made for the same DLS service. The change attached a test location to the non-production service; this location itself was not live, but the change triggered a refresh of network configuration globally.

                                                                                                                                              Say what now? A test triggered a global production change?

                                                                                                                                              > Due to the earlier configuration error linking the 1.1.1.1 Resolver's IP addresses to our non-production service, those 1.1.1.1 IPs were inadvertently included when we changed how the non-production service was set up.

                                                                                                                                              You have a process that allows some other service to just hoover up address routes already in use in production by a different service?

                                                                                                                                              • i_niks_86 6 hours ago

                                                                                                                                                Many commenters assume fallback behavior exists between DNS providers, but in practice, DNS clients - especially at the OS or router level -rarely implement robust failover for DoH. If you're using cloudflare-dns(.)com and it goes down, unless the stub resolver or router explicitly supports multi-provider failover (and uses a trust-on-first-use or pinned cert model), you’re stuck. The illusion of redundancy with DoH needs serious UX rethinking.

                                                                                                                                                • tankenmate 6 hours ago

                                                                                                                                                  I use routedns[0] for this specific reason it handles almost all DNS protocols; UDP, TCP, DoT, DoH, DoQ (including 0-RTT). But more importantly is has a very configurable route steering even down to a record by record basis if you want to put up with all the configuration involved. It's very robust and is very handy, I use 1.1.1.1 on my desktops and servers and when the incident happened I didn't even notice as the failover "just worked". I had to actually go look at the logs because I didn't notice.

                                                                                                                                                  [0] https://github.com/folbricht/routedns

                                                                                                                                                • neurostimulant 2 hours ago

                                                                                                                                                  I never noticed the outage because my isp hijack all outbound udp traffic to port 53 and redirect them to their own dns server so they can apply government-mandated cencorship :)

                                                                                                                                                  • dawnerd 8 hours ago

                                                                                                                                                    Oh this explains a lot. I kept having random connection issues and when I disabled AdGuard dns (self hosted) it started working so I just assumed it was something with my vm.

                                                                                                                                                    • angst 8 hours ago

                                                                                                                                                      I wonder how uptime ratio of 1.1.1.1 is against 8.8.8.8

                                                                                                                                                      Maybe there is noticeable difference?

                                                                                                                                                      I have seen more outage incident reports of cloudflare than of google, but this is just personal anecdote.

                                                                                                                                                      • Pharaoh2 5 hours ago

                                                                                                                                                        https://www.dnsperf.com/#!dns-resolvers

                                                                                                                                                        Last 30 days, 8.8.8.8 has 99.99% uptime vs 1.1.1.1 has 99.09%

                                                                                                                                                        • ta1243 6 hours ago

                                                                                                                                                          I guess it depends on where you are and what you count as an outage. Is a single failed query an outage?

                                                                                                                                                          For me cloudflare 1.1.1.1 and 1.0.0.1 have a mean response time of 15.5ms over the last 3 months, 8.8.8.8 and 8.8.4.4 are 15.0ms, and 9.9.9.9 is 13.8ms.

                                                                                                                                                          All of those servers return over 3-nines of uptime when quantised in the "worst result in a given 1 minute bucket" from my monitoring points, which seem fine to have in your mix of upstream providers. Personally I'd never rely on a single provider. Google gets 4 nines, but that's only over 90 days so I wouldn't draw any long term conclusions.

                                                                                                                                                        • trollbridge 2 hours ago

                                                                                                                                                          I got bit by this, so dnsmasq now has 1.1.1.2, Quad9, and Google’s 8.8.8.8 with both primary and secondary.

                                                                                                                                                          Secondary DNS is supposed to be in an independent network to avoid precisely this.

                                                                                                                                                          • nness 4 hours ago

                                                                                                                                                            Interesting side-effect, the Gluetun docker image uses 1.1.1.1 for DNS resolution — as a result of the outage Gluetun's health checks failed and the images stopped.

                                                                                                                                                            If there were some way to view torrenting traffic, no doubt there'd be a 20 minute slump.

                                                                                                                                                            • udev4096 5 hours ago

                                                                                                                                                              This is why running your own resolver is so important. Clownflare will always break something or backdoor something

                                                                                                                                                              • wreckage645 3 hours ago

                                                                                                                                                                This is a good post mortem, but improvements only come with change on processes. It seems every team at CloudFlare is approaching this in isolation, without a central problem management. Every week we see a new CloudFlare global outage. It seems like the change management processes is broken and needs to be looked at..

                                                                                                                                                                • geoffpado 8 hours ago

                                                                                                                                                                  This was quite annoying for me, having only switched my DNS server to 1.1.1.1 approximately 3 weeks ago to get around my ISP having a DNS outage. Is reasonably stable DNS really so much to ask for these days?

                                                                                                                                                                  • codingminds 8 hours ago

                                                                                                                                                                    If you consume a service that's free of charge, it's at least not reasonable to complain if there's an outage.

                                                                                                                                                                    Like mentioned by other comments, do it on your own if you are not happy with the stability. Or just pay someone to provide it - like your ISP..

                                                                                                                                                                    And TBH I trust my local ISP more than Google or CF. Not in availability, but it's covered by my local legislature. That's a huge difference - in a positive way.

                                                                                                                                                                    • chii 5 hours ago

                                                                                                                                                                      > it's covered by my local legislature

                                                                                                                                                                      which might not be a good thing in some jurisdictions - see the porn block in the UK (it's done via dns iirc, and trivially bypassed with a third party dns like cloudflare's).

                                                                                                                                                                      • komali2 7 hours ago

                                                                                                                                                                        > it's at least not reasonable to complain if there's an outage.

                                                                                                                                                                        I don't think this is fair when discussing infrastructure. It's reasonable to complain about potholes, undrinkable tap water, long lines at the DMV, cracked (or nonexistent) sidewalks, etc. The internet is infrastructure and DNS resolution is a critical part of it. That it hasn't been nationalized doesn't change the fact that it's infrastructure (and access absolutely should be free) and therefore everyone should feel free to complain about it not working correctly.

                                                                                                                                                                        "But you pay taxes for drinkable tap water," yes, and we paid taxes to make the internet work too. For some reason, some governments like the USA feel it to be a good idea to add a middle man to spend that tax money on, but, fine, we'll complain about the middle man then as well.

                                                                                                                                                                        • gkbrk 6 hours ago

                                                                                                                                                                          But you can just run a recursive resolver. Plenty of packages to install. The root DNS servers were not affected, so you would have been just fine.

                                                                                                                                                                          DNS is infrastructure. But "Cloudflare Public Free DNS Resolver" is not, it's just a convenience and a product to collect data.

                                                                                                                                                                          • JdeBP an hour ago

                                                                                                                                                                            One can even run a private root content DNS server, and not be affected by root problems either.

                                                                                                                                                                            (This isn't a major concern, of course; and I mention it just to extend your argument yet further. The major gain of a private root content DNS server is the fraction of really stupid nonsense DNS traffic that comes about because of various things gets filtered out either on-machine or at least without crossing a border router. The gains are in security and privacy more than uptime.)

                                                                                                                                                                          • delfinom 18 minutes ago

                                                                                                                                                                            >That it hasn't been nationalized doesn't change the fact that it's infrastructure (and access absolutely should be free) and therefore everyone should feel free to complain about it not working correctly.

                                                                                                                                                                            >"But you pay taxes for drinkable tap water," yes, and we paid taxes to make the internet work too. For some reason, some governments like the USA feel it to be a good idea to add a middle man to spend that tax money on, but, fine, we'll complain about the middle man then as well.

                                                                                                                                                                            You don't want DNS to be nationalized. Even the US would have half the internet banned by now.

                                                                                                                                                                            • codingminds 6 hours ago

                                                                                                                                                                              You are right infrastructure is important.

                                                                                                                                                                              But opposite to tap water there are a lot of different free DNS resolvers that can be used.

                                                                                                                                                                              And I don't see how my taxes funded CFs DNS service. But my ISP fee covers their DNS resolving setup. That's the reason why I wrote

                                                                                                                                                                              > a service that's free of charge

                                                                                                                                                                              Which CF is.

                                                                                                                                                                              • komali2 4 hours ago

                                                                                                                                                                                DNS shouldn't be privatized at all since it's a critical part of internet infrastructure, however at the same time the idea that somehow it's something a corporation should be allowed to sell to you at all (or "give you for free") is silly given that the service is meaningless without the infrastructure of the internet, which is built by governments (through taxes). I can't even think of an equivalent it's so ridiculous that it's allowed at all, my best guess would be maybe, if your landlord was allowed to charge you for walking on the sidewalk in front of the apartment or something.

                                                                                                                                                                                • codingminds 2 hours ago

                                                                                                                                                                                  DNS is not privatized. This is not about the root DNS servers, it's just about one of many free resolvers out there - in this case one of the bigger and popular ones.

                                                                                                                                                                          • bauruine 8 hours ago

                                                                                                                                                                            Why not use multiple? You can use 1.1.1.1, your ISPs and google at the same time. Or just run a resolver yourself.

                                                                                                                                                                            • ripdog 5 hours ago

                                                                                                                                                                              >Or just run a resolver yourself.

                                                                                                                                                                              I did this for a while, but ~300ms hangs on every DNS resolution sure do get old fast.

                                                                                                                                                                              • xpe 24 minutes ago

                                                                                                                                                                                Ouch. What resolver? What hardware?

                                                                                                                                                                                With something like a N100- or N150-based single board computer (perhaps around $200) running any number of open source DNS resolvers, I would expect you can average around 30 ms for cold lookups and <1 ms for cache hits.

                                                                                                                                                                            • pparanoidd 7 hours ago

                                                                                                                                                                              A single incident means 1.1.1.1 is no longer reasonably stable? You are the unreasonable one

                                                                                                                                                                              • yjftsjthsd-h 7 hours ago

                                                                                                                                                                                Although I agree 1.1.1.1 is fine: To this particular commenter they've had one major outage in 3 total weeks of use, which isn't exactly a good record. (And it's understandable to weigh personal experience above other people claiming this isn't representative.)

                                                                                                                                                                                • geoffpado 4 hours ago

                                                                                                                                                                                  Two incidents from two completely different providers in three weeks means that my personal experience with DNS is remarkably less stable recently than the last 20-ish years I've been using the Internet.

                                                                                                                                                                                  • cryptonym 6 hours ago

                                                                                                                                                                                    I have been online for 30y and can't remember being affected by downtime from my ISP DNS.

                                                                                                                                                                                    When DNS resolver is down, it affects everything, 100% uptime is a fair expectation, hence redundancy. Looks like both 1.0.0.1 and 1.1.1.1 were down for more than 1h, pretty bad TBH, especially when you advise global usage.

                                                                                                                                                                                    RCA is not detailed and feels like a marketing stunt we are now getting every other week.

                                                                                                                                                                                  • bjoli 8 hours ago

                                                                                                                                                                                    Run your own forwarder locally. Technitium dns makes it easy.

                                                                                                                                                                                  • greggsy 3 hours ago

                                                                                                                                                                                    I’d love to know legacy systems they’re referring to.

                                                                                                                                                                                    • rswail 6 hours ago

                                                                                                                                                                                      I now run unbound locally as a recursive DNS server, which really should be the default. There's no reason not to in modern routers.

                                                                                                                                                                                      Not sure what the "advantage" of stub resolvers is in 2025 for anything.

                                                                                                                                                                                      • egamirorrim 6 hours ago

                                                                                                                                                                                        What's that about a hijack?

                                                                                                                                                                                        • homero 6 hours ago

                                                                                                                                                                                          Related, non-causal event: BGP origin hijack of 1.1.1.0/24 exposed by withdrawal of routes from Cloudflare. This was not a cause of the service failure, but an unrelated issue that was suddenly visible as that prefix was withdrawn by Cloudflare.

                                                                                                                                                                                          • JdeBP an hour ago

                                                                                                                                                                                            And because people highlighted it on social media at the time of the outage, many thought that the bogus route was the cause of the problem.

                                                                                                                                                                                            • kylestanfield 5 hours ago

                                                                                                                                                                                              So someone just started advertising the prefix when it was up for grabs? That’s pretty funny

                                                                                                                                                                                              • woutifier 5 hours ago

                                                                                                                                                                                                No they were already doing that, the global withdrawal of the legitimate route just exposed it.

                                                                                                                                                                                          • thunderbong 8 hours ago

                                                                                                                                                                                            How does Cloudflare compare with OpenDNS?

                                                                                                                                                                                            • blurrybird an hour ago

                                                                                                                                                                                              You’d be better off comparing it to Quad9 based on performance, privacy claims, and response accuracy.

                                                                                                                                                                                            • nodesocket 7 hours ago

                                                                                                                                                                                              I used to configure 1.1.1.1 as primary and 8.8.8.8 as secondary but noticed that Cloudflare on aggregate was quicker to respond to queries and changed everything to use 1.1.1.1 and 1.0.0.1. Perhaps I'll switch back to using 8.8.8.8 as secondary, though my understanding is DNS will round-robin between primary and secondary, it's not primary and then use secondary ONLY if primary is down. Perhaps I am wrong though.

                                                                                                                                                                                              EDIT: Appears I was wrong, it is failover not round-robin between the primary and secondary DNS servers. Thus, using 1.1.1.1 and 8.8.8.8 makes sense.

                                                                                                                                                                                              • ta1243 6 hours ago

                                                                                                                                                                                                Depends on how you configure it. In resolv.conf systems for example you can set a timeout of say 1 second and do it as main/reserve, or set it up to round-robin. From memory it's something like "options:rotate"

                                                                                                                                                                                                If you have a more advanced local resolver of some sort (systemd for example) you can configure whatever behaviour you want.

                                                                                                                                                                                              • hkon 6 hours ago

                                                                                                                                                                                                To say I was surprised when I finally checked the status page of cloudflare is an understatement.

                                                                                                                                                                                                • sylware 3 hours ago

                                                                                                                                                                                                  cloudflare is providing a service designed to block noscript/basic (x)html browsers.

                                                                                                                                                                                                  I know.

                                                                                                                                                                                                  • sneak 6 hours ago

                                                                                                                                                                                                    1.1.1.1 does not operate in isolation.

                                                                                                                                                                                                    It is designed to be used in conjunction with 1.0.0.1. DNS has fault tolerance built in.

                                                                                                                                                                                                    Did 1.0.0.1 go down too? If so, why were they on the same infrastructure?

                                                                                                                                                                                                    This makes no sense to me. 8.8.8.8 also has 8.8.4.4. The whole point is that it can go down at any time and everything keeps working.

                                                                                                                                                                                                    Shouldn’t the fix be to ensure that these are served out of completely independent silos and update all docs to make sure anyone using 1.1.1.1 also has 1.0.0.1 configured as a backup?

                                                                                                                                                                                                    If I ran a service like this I would regularly do blackouts or brownouts on the primary to make sure that people’s resolvers are configured correctly. Nobody should be using a single IP as a point of failure for their internet access/browsing.

                                                                                                                                                                                                    • detaro 5 hours ago

                                                                                                                                                                                                      You don't need to test if peoples resolvers handle this cleanly, because its already known that many don't. DNS fallback behavior across platforms is a mess.