• LeoPanthera 2 hours ago

    It's not going to be long before we need to move to a whitelist model, rather than a blacklist model.

    It ironically makes me think of the Yahoo Web Directory in the 90s.

    Time is a flat circle.

  • Ringz 20 hours ago

    Installed! This should not be a function of the search engine nor a plugin. This should be integrated in the browser.

    Another great function (not for this plugin) should be the option to "bundle" all search results from the same domain. Stuff them under one collapsible entry. I hate going through lists and pages of apple/google/synology/sonos/crab urls when I already know that I have to search somewhere else.

    • cormorant a day ago

      I'm fed up too. Spammy, AI-looking sites are showing up more and more. For some reason, many of them use the same Wordpress theme with a light gray table of contents - they look like this: https://imgur.com/a/totally-not-ai-generated-efsumgZ

      The problem seems worse on "alternative" search engines, e.g. DuckDuckGo and Kagi, which both use Bing. It's been driving me back to Google.

      A blocklist seems like a losing proposition, unless, like adblock filter lists, it balloons to tens of thousands of entries and gets updated constantly.

      Unfortunately, this kind of blocklist is highly subjective. This list blocks MSN.com! That's hardly what I would have chosen.

      • popcar2 a day ago

        Even Google is plagued by spam, I've tried all sorts of search techniques and alternative engines but I feel like the only solution seems to be doing things manually. I was already starting to block things by myself but I thought it'd be more productive to make the list public and try crowdsourcing. Even now, searching "how to partition a hard disk" would often drive you to low-effort sites telling you to use their software.

        > Unfortunately, this kind of blocklist is highly subjective. This list blocks MSN.com! That's hardly what I would have chosen.

        It's definitely a bit opinionated, but it's open to discussion - you can create an unblock request issue (if you care enough to do so, of course!). The reason I blocked MSN is that it just re-hosts articles from other websites, so I'd rather see the official source than be tricked into Microsoft's site which is very annoying, like how it opens another article if you scroll too fast down.

        • maximilianthe1 an hour ago

          Recently learned a little trick for google. Adding `-ai` at the end of query helps. Not much, but something.

        • radicality 20 hours ago

          Afaik DDG is just Bing, whereas Kagi is using Google, Bing, (Yandex?) among others - https://help.kagi.com/kagi/search-details/search-sources.htm...

          As a Kagi user I actually haven’t encountered much search result spam, surprised you’re seeing enough there to drive you back to Google!

          • rendaw 7 hours ago

            I get tons when looking up recipes and cooking related information. Things that will say "X can be refrigerated for up to two weeks" then in the next paragraph "X is fine to refrigerate and eat for 2-3 days" or similar.

            I'd block them but there seem to be infinite. They're probably buying 10+ character domains using random words/names/phrases in bulk.

            • nosioptar 19 hours ago

              You can use ublacklist without a list and just block shit sites as you see them.

              I'm loving being able to search for something without getting results from garbage sites like howtogeek, stackoverflow, MSN, Pinterest, etc.

            • james-bcn a day ago

              With the Kagi search engine is a way in the settings to bulk-upload lists of domains to block (or upvote) them. Has anyone uploaded a list like this to it?

              I may do that.

              • freedomben 21 hours ago

                That was my thought as well. Their UI is great for one-at-time operations, but an API endpoint I could curl and sync with a local file I keep in git would be killer.

                Although, using this via the extension would make it cross-platform so the block affects kagi and google, which could be nice.

                Although, that would require manual syncing between devices, which would not be nice.

                Although, uploading it to kagi through API doesn't mean I have to not use the extension, so having the cake and eating it too may be possible.

                • thoughtpalette 20 hours ago

                  Was thinking of that as I was browsing this doc! I just did the ole' reddit.com -> old.reddit.com redirect via kagi yesterday.

                • gtfiorentino 19 hours ago

                  Hi @popcar2 — how are you sourcing the domains for the blocklist? We'd like to evaluate those domains and consider whether they should be removed from DuckDuckGo as spam. You can also report a site directly in the search results by clicking the three-dot menu next to the link and selecting "Share Feedback about this Site".

                  • popcar2 19 hours ago

                    Hi! I'm mostly going through them manually. Not all of the domains in the list are literally spam - most of the list also includes misleading sites like corporate blogs that trick people into downloading their software.

                    You might be interested in the AI spam/low effort section though, one that tops DDG often are these AI generated tech articles: https://github.com/popcar2/BadWebsiteBlocklist/issues/1

                    They're the same site under different domains, you can tell it's AI by its writing style, how much they churn out per day, how little info there is about who's writing it, how similarly the about pages are written, and how the same article is suspiciously also in similar-sounding sites.

                    Another one I just caught today that was on top of page 1: https://github.com/popcar2/BadWebsiteBlocklist/issues/84

                    I'll be sure to report these sites as I'm adding to the list, thanks.

                    • gtfiorentino 19 hours ago

                      Got it, that's good to know, thanks! I've added the domains in the AI spam / low effort section to our list of user reports for review.

                  • shortformblog 21 hours ago

                    The problem with a list like this is that a “bad website” is in the eye of the beholder. I’m not saying that there’s anything wrong with you personally not liking the Shopify or the Semrush blog. But I think that everyone else has their own calculus.

                    It’s the same reason why social media blocklists can be problematic—everyone’s calculus is different.

                    My suggestion is that you promote it as a starter and suggest that users fork it for their own needs.

                    • manx 40 minutes ago

                      Some kind of community notes consensus system could make sense here to find common ground. When a diverse set of people agree that a site should belong to the list, only then it is added.

                      • swayvil 20 hours ago

                        Some kind of democratic process. Where membership and blacklist are both something arrived at democratically.

                        It could be simple.

                        Good?

                        • shortformblog 18 hours ago

                          Seems like a kit that can be personalized across broad categories might be a better bet. By putting the onus on one list you don’t solve the main problem, which is that the list might block things you’re fine with.

                      • edm0nd 20 hours ago

                        I recently started a crypto scam/phishing blocklist if you wanna roll these into your list as well.

                        also works well with Pi-hole and other platforms.

                        https://github.com/spmedia/Crypto-Scam-and-Crypto-Phishing-T...

                        • Kuinox 2 hours ago

                          I don't understand why so much corporate blogs are blocked. Most of them are about their product, or about the industry in general.

                          - For example, kaspersky blog doesn't look bad.

                          - CCleaner blog is just a list of update.

                          • popcar2 2 hours ago

                            They aren't, that's why they're blocked. You can see more detail in the issues but the blocked corporate blogs often make clickbait to advertise their product, like https://www.ccleaner.com/knowledge/windows-11-problems-how-t...

                            • owenthejumper an hour ago

                              I think that's misleading on it's own and that makes this list useless. From that logic you should block every single corporate blog out there.

                              This looks like someone's personal list not a serious effort.

                            • jwx48 2 hours ago

                              Because corporate blogs are predominantly nothing but marketing fluff that dominates search results so thoroughly that they drown out any actual useful information.

                            • MortyWaves a day ago

                              Who’s going to be the first to make the PR for Medium and “dev.to”?

                              • CamperBob2 21 hours ago

                                Why Medium?

                                • bluetidepro 20 hours ago

                                  Likely because the annoying paywall to most of Medium.

                              • nayuki 19 hours ago

                                Related: Freya Holmér - "Generative AI is a Parasitic Cancer" https://www.youtube.com/watch?v=-opBifFfsMY (1h19m54s) [2025-01-02].

                                She talks at length about how pages of AI-generated nonsense text are cluttering search results on Google and all other search engines.

                                • huesatbri 2 hours ago

                                  She really put my thoughts into words regarding this. The “who is this for” part really hit home.

                                • the_snooze 21 hours ago

                                  This is one of those features a proper search engine (i.e., not a thinly-veiled advertising network) should have. If users can customize their search results and share their sorting/filtering methods, then that presents a large number of constantly-moving targets that greatly drives up the cost of SEO. There's no "making the Google algorithm happy." Instead, it becomes more "making the users happy."

                                  • bityard 21 hours ago

                                    Google used to do this years ago but clawed it back around the time they started removing _all_ customizations under the premise of, "we know how to customize your results better than you do."

                                    DuckDuckGo has site blocking. The problem is that there are so many SEO-optimized blogspam, referral link, and other "garbage" sites that you could spend a lifetime blocking each one individually before you get any actual work done. And it's only getting worse now that LLMs can generate a whole web site for you in a matter of minutes. I imagine a dedicated individual could provision several thousand websites/blogs per day, just chock full of ads and referral links.

                                  • Night_Thastus 21 hours ago

                                    I've been using GoogleHitHider, which also works on other search engines like DDG. Worked well for many years. It's a list I curated myself though for personal use, I definitely wouldn't mind seeing what other people had.

                                    • mrweasel 21 hours ago

                                      I love that it just includes all of msn.com.

                                      • mrbluecoat 20 hours ago

                                        I guess the author's "boiling hatred to bad tech support articles" leads to some overreach

                                        • popcar2 19 hours ago

                                          This doesn't block you from visiting MSN, but it does stop their articles from appearing in search. The reason is that MSN just re-hosts articles from other sites rather than provide anything of value. MSN posts often outrank their original source because Microsoft is pushing it hard on Windows/Bing/Edge.

                                          For example: https://www.msn.com/en-us/movies/news/jodie-foster-heckled-a... is just a re-hosted version https://www.independent.co.uk/arts-entertainment/tv/news/jod...

                                          My hope in hiding MSN is to allow the original sources to rise back up to the top.

                                          • roskelld 6 hours ago

                                            I'm going to have a look at this. I currently run a script that adds `-site:msn.com` to all of my DDG searches. It's kinda ugly.

                                            • qingcharles 19 hours ago

                                              My small rebuttal to that is that msn.com occasionally has articles they've sucked in that are paywalled on the original sites.

                                              But I have archive.is for the most part to get around that issue.

                                            • qingcharles 19 hours ago

                                              msn.com is actually useful from time-to-time as they have syndicated articles which are otherwise stuck behind paywalls on other sites.

                                          • noleary 20 hours ago

                                            This is cool! Not entirely sure whether I think it's a good idea, but I wonder if it'd be useful to come up with a way to tranche websites.

                                            Some sites are complete garbage and should be blocked, for course. Others (e.g., in my experience, Quora) are sometimes quite good and sometimes quite bad. Wouldn't be my first choice, but I've found them useful at times.

                                            For a given search, maybe you try with the most aggressive blocking / filtering. If you fail to find what you're looking for, maybe soften the restriction a bit.

                                            Maybe this is overwrought...

                                            • lambdaone a day ago

                                              I think there's big potential in using DNS blacklists for this: they have the advantage of being massively scalable and simple to maintain, and clients configuration to use them is also easy.

                                              The scalability comes from the caching inherent in DNS; instead of having to have millions of people downloading text files from a website over HTTP on a regular basis, the data is in effect lazy-uploaded into the cloud of caching DNS resolvers, with no administration cost on behalf of the DNSBL operator.

                                              Reputation whitelists (or other scoring services) would also be just as easy to implement.

                                              • bityard 21 hours ago

                                                DNS blacklists work fine for blocking access to sites or certain known-sketchy FQDNs/domains but do nothing to hide low-quality search engine results, which is what this is all about.

                                              • antithesis-nl a day ago

                                                So, if you already run uBlock Origin (and of course you are), you can use this list without installing any additional extensions by going to 'Filter lists' in the uBlock settings, then Import, then enter https://raw.githubusercontent.com/popcar2/BadWebsiteBlocklis... as the URL.

                                                Not saying you should, just that you could...

                                                • popcar2 a day ago

                                                  I think this would block you from visiting the websites, but they'd still show up on search results. uBlacklist doesn't block them, but rather just hides them for search engines which IMO is a better approach.

                                                  • antithesis-nl a day ago

                                                    Yeah, I just tested this, and you're right. Going to google.com and entering solveyourtech as a search term, did indeed still return their site as a result.

                                                    On clicking it, uBlock blocked my visit, but that may or may be not enough for you, in which case an additional plugin may be warranted.

                                                • ge96 21 hours ago

                                                  Tangent, I may laughably use Malware Bytes but when I'm image searching on Google and it stops me from opening a picture with a adware alert. I'm like "oh damn"... I use an adblocker/generally don't do anything sus on my main OS but yeah. I'm still unsure am I safe? (paranoia ensues)

                                                  I use a VM in other scenarios but even that, properly separated?

                                                  • miyuru 20 hours ago

                                                    Brave has goggles that do exactly this. you can even share the list with others.

                                                    https://search.brave.com/goggles/discover

                                                    • dmix 20 hours ago

                                                      does the msn.com one block their news site?

                                                      • ColdTakes a day ago

                                                        DuckDuckGo and Kagi allow you to remove entire sites from search results and it is the best feature of these websites.

                                                        • dylan604 a day ago

                                                          I get Kagi since you pay for it, but doesn't using a logged in account with DDG defeat the purpose of using DDG? How does having a search history associated with your DDG account improve things? You're just moving who knows your search history from one org to another

                                                          • szszrk 21 hours ago

                                                            If you are not logged in, your search history may be "not associated with you account", but it still is associated with you.

                                                            It's more a matter of whom do you trust. Private mode in browsers still gathers unique user IDs, fingerprinting is widespread and fairly precise. The "logged in" part doesn't change that much.

                                                            • ColdTakes 21 hours ago

                                                              You don't need an account for DDG. I assume the record keeping for which sites you exclude is cookie or session based.

                                                          • renegat0x0 19 hours ago

                                                            I think it could also be accomplished using searxng, and blocking it there.

                                                            • lubujackson 20 hours ago

                                                              download.cnet.com serves up spam nowadays? How far the mighty have fallen.

                                                              • Animats 11 hours ago

                                                                Does Google still allow that in an add-on?

                                                                • swayvil 20 hours ago

                                                                  How do you ensure good contributors and good contributions?

                                                                  Do you have a forum where you discuss prospective contributions etc?

                                                                  • popcar2 20 hours ago

                                                                    I'll manually review all the block/unblock requests, and you can open a discussion in the issues page. There's a template and tag for discussion posts.

                                                                    • swayvil 20 hours ago

                                                                      One man (with help) against the internet.

                                                                      • popcar2 20 hours ago

                                                                        Yep. If all else fails, the list is CC0 so anyone can fork it and carry the torch hopefully!

                                                                  • qiine 15 hours ago

                                                                    Thank you for your service

                                                                    • sandropuppo a day ago

                                                                      What about just using perplexity? It's already doing that I think.

                                                                      • popcar2 a day ago

                                                                        I don't think AI search engines are a proper replacement for regular ones. The summaries are almost always worse than finding a proper article, which gives better context and editing (and doesn't make many subtle mistakes).

                                                                        • verdverm 21 hours ago

                                                                          I think it depends on the specific query. If I'm looking for something specific like the date of a holiday, traditional search is better. If I'm querying about a programming issue that would land me on several SO pages that I need to piece together to get the full answer, LLM chat can save me a bunch of time and answer in terms of my specific variable names

                                                                          • fn-mote 10 hours ago

                                                                            > If I'm querying about a programming issue that would land me on several SO pages that I need to piece together to get the full answer

                                                                            If I have to piece together multiple SO answers, the issue is complex enough that I better actually understand it. I am not at the point where I am trusting an LLM for this.

                                                                            > LLM chat can [...] answer in terms of my specific variable names

                                                                            Which has value 0 for me. What are you doing that this is an asset? Generating a huge block of code? Write a function!

                                                                            Edit: in fact, parent is the author of a complex configuration managmeent tool (see profile) so getting a big block of code regurgitated with the correct variable names is probably an asset for them.

                                                                            • verdverm 10 hours ago

                                                                              Here's an example I like to share, re: when LLMs can be better than doing the searching myself: https://topicalsource.dev/chat/023e7e54-947b-490d-bcd8-89cc2...

                                                                              I understand the concepts, it's not complex, but it's something I don't use or do daily. One of the other differences with working with LLMs over search is that I can provide a lot more input as part of my query. That context is often used within the answer, a much better experience than having to map multiple other examples onto mine.

                                                                              Also, I am not the author of a complex configuration management tool. Not sure what you are misreading. I have authored a deterministic code generation tool, maybe that is what you mean? It however is an alternative to LLM code generation that existed prior to the current LLM hype cycle

                                                                              If you don't like LLMs, that is totally fine. You don't need to put down other people's usage or question why or how they get value out of it with feigns. Perhaps you might consider spending more time with the new knowledge tools that will not be going away. I just tried out the new Gemini Research assistant with the query...

                                                                              """ google has an open source project that enables pull requests to be embedded into a git repository and also comes with a gui. Can you help me find this project? """

                                                                              It took a couple of minutes and came back with exactly the project I was looking for. Saved me a bunch of time and headache trying to find this: https://github.com/google/git-appraise

                                                                              I didn't have to know the exact search words or phrases, I didn't have to filter through multiple search results. I worked on this post while my LLM assistant did it for me