At a former job, we reverse engineered the trading APIs of most American retail stock brokerages (Fidelity, E-Trade, Robinhood, TD Ameritrade, etc). We did it by rooting an iPhone and using Charles Proxy to grab the unencrypted traffic.
I learned a lot from that experience, and it's also just plain fun to do. We did get some strongly worded letters from Robinhood though, lol. They tried blocking our servers but we just set up this automated system in Digital Ocean that would spin up a new droplet each time we detected a blockage, and they were never able to stop us after that.
Fun times.
I did almost this exact same thing back in 2015~ ish when I was in high school over Christmas break. I reverse engineered the anime streaming site Crunchyroll's API via their Android and PS3 app using some HTTP proxy application and trial + error. I ended having a proper HLS-based streaming player and Android TV app back when their Android app was still Flash based. It was lots of fun!
2015/2016 was exactly when I was doing the above job. We could've hired you as an intern!
Also this would have been winter of 2014 and into early 2015, so a bit before :)
Sadly the CR forums are gone, so my rather popular thread that had feedback and support is long gone.
Maybe! I considered pursuing a job there at the time, but I opted to get a degree instead. Being located in the Midwest would've made it rather challenging anyways.
This reminds of an internship I did long time ago at a "insight generation company" that use to reverse engineer all do the flights/booking website for price forecasts. Getting paid for automating scraping/reverse engineering APIs was always fun.
I wonder why didn't hey block all Digital Ocean IP ranges.They only need residential customers to access Robin Hood, so they can block everything else.
It's not exactly neither hard nor expensive to buy a residential IP proxy services to get around that.
On the defender side, it's much funnier to poison the data of identified scrapers than to immediately ban them. Let them work out that their data has been altered for a while, clean up their datasets, and work to understand what identifies them as scrapers.
Definitely, but it's also a lot more complex to present credible looking false data than to simply reject a request.
There could be some legitimate proxy services set up on DO? Not sure, always wondered why.
> We did get some strongly worded letters from Robinhood though, lol.
Unsurprisingly, the most sleazy players are the first ones to go after someone accessing their services in ways they didn't anticipate or intend :).
Wouldn’t they block/suspend the account/credentials?
The product was for end users so the traffic was coming from any account. They never discovered the accounts we used for testing. At least, the American ones didn't. When we finished with US companies, we went for Singapore.
Our CEO was friendly with an investor who had an account at some big Singaporean trading firm called Lim Tan. He gave us his account credentials and I began working. A few days later my boss comes over to my desk and says "stop whatever you're doing right now." Apparently my traffic had set off so many alarm bells that the CTO of Lim Tan was woken up at 2am. They permanently banned the investor, which I felt really bad about. What's crazy is that I wasn't even doing anything weird. I was just poking a bit at their authentication methods. That was when I learned that Singapore tech doesn't fuck around.
Always use puppet accounts for reverse engineering.
Reverse engineering the dudes holding your funds isn't a good idea to begin with. Too much risk. Better to work with them directly or switch to a better service which does feature APIs.
I wonder if using something like puppeteer or playwright to actually make the server think everything is being done client-side would still raise flags.
Scraping a well-built API at human speed often isn't terribly useful, and once you start ramping up the scraping, it's account creation/patterns/use frequency that will set an alarm.
Faking real user clients won't prevent these alarms.
The purpose of our work wasn’t scraping - we were building a unified UI where our users could trade with any vendor of their choosing. Kinda like Plaid but specifically for retail stock trading. So the goal was to implement their trading API.
You can just.. do that? I think I discovered why all my ideas suck.
Yep. Though it’s really hard. To capture the US market (E*Trade, Fidelity, TD Ameritrade, Scottrade, Schwab, Interactive Brokers, Robinhood) it took me and another engineer almost 2 years. It’s non-trivial.
I remember a friend telling me years ago some scraping that happened where they worked - they scraped results from a bunch of different websites to create SEO websites and they had some setup using tor to avoid getting blocked. One of the websites that the company actually depended on apparently rendered results using a whole assortment of visually identical but structurally (HTML-wise) different methods which were returned randomly to hamper scrapers. They eventually gave that up because it turned out TV closed captions can be downloaded as XML and they had what the company needed.
I tried to use Charles on Robin Hood, but it looks like they use cert pinning to prevent it.
It reminds me of the expression "locks are to keep honest people out," in that code which runs on a device you control is code that you control: https://github.com/shroudedcode/apk-mitm#readme
[Frida](https://frida.re/) is fantastic for de-pinning certs in applications. Can be fiddly, but when it works, it just works™.
yeah I'm sure it's much harder today. I did this work like 8 or 9 years ago and I think fewer controls were in place at the time.
I've done the same for use in foreign currency exchanges. The adventure of reverse engineering protocols and finding security checks, etc was more fun than the actual accomplishment lol.
Can't one just list all of digital ocean's ip blocks?
Like sure then you can add in hertzer or w/e and keep adjusting but idk if somebody keeps ban dodging by using the same provider it seems like you'd just try banning that provider early on?
The trick is to use the same provider. I did it with the Expedia api for a while. At the time they were using AWS so running it on Lambda made it very tricky for them to do much about it. They were hardly about to block all AWS IP's and risk their own services or any of the "real" partners having issues.
Same sort of timeframe, a project I worked on used netwoking via mobile hotspots on a bunch of Android phones with SIMs from a provider that used CGNAT. If the target websites wanted to block that, they'd be blocking well over 10% of all mobile phones in Australia.
(Hmmm, all the devices we used then would have just stopped working with the shutdown on the 3G network here. I wonder if it's all broken, or if they've upgraded all those devices to 4/5G ones?)
Quite a few people do that. My grocery store's app and the Tesla app stop working if I bring up my vpn through DO. (I first set that up years ago because the legoland hotel's wifi was blocking reddit.)
You can pretty much bypass this with residential proxies.
Yeah idk, they should've been able to but for some reason they didn't.
Probably because they'd potentially be blocking "legitimate" users
Banning IP Ranges from datacenters is pretty easy and common, surprising they could not combat that.
You were able to connect to those APIs without auth? As far as I know, they all require it.
No we would use our own accounts that we sourced from either the CEO/CTO or someone else.
Why couldn't they block you then? They should have been able to quickly disable the accounts.
Our product was built for end users, so the traffic coming from our servers could technically be from any account. But as to why we weren't blocked during testing, that I'm not sure about. It's been about 8 years since I did that work - I assume we had someone's account who wasn't obviously connected to the company.
Sounds like a version of Plaid lol
Yep it was very similar, just with more of a focus on retail trading.
This is how I made a better version of the nhl.com site [1] that has a better UI (you can see scores/schedules much more easily), is mobile first, has no ads, and responsiveness built in. I did the same for the AHL [2], and the PWHL [3].
[1] https://nhl-remix.vercel.app/ [2] https://ahl-remix.vercel.app/ [3] https://pwhl-remix.vercel.app/
I did a similar thing (improving a site I'm unaffiliated with) with Skill Capped which has videos to improve at League of Legends. My version has a search bar which they still haven't added in 4.5 years.
Make one for the NFL, please! I can’t stand how hard it is to get scores on their website. Haha
Funny you would mention that. A few weeks ago, I wrote a Python-based client library [0] for the website kicker.de [1]. It supports the NFL, too:
import asyncio
from kickerde_api_client import Api
from kickerde_api_client.model import LeagueId
api = Api()
query = {'league': LeagueId.NFL, 'season': '2024/25'}
season = asyncio.run(api.league_season(**query))
print(season['longName']) # 'National Football League'
print(season['country']['longName']) # 'USA'
print([
team['shortName']
for team in season['teams'].values()
if team['shortName'].startswith('B')
]) # ['Buffalo', 'Baltimore']
day = season['gamedays'][18]
print(str(day['dateFrom'].date())) # '2025-01-05'
[0]: https://kickerde-api-client.readthedocs.io/en/stable/autoapi...There's like a gazzillion alternative scoring sites. Googling the team name also works while they're playing, Google usually has a good feed.
Poked around a bit. It's responsive and looks great on mobile. Kudos
For a minute I thought PWHL was short for "pwn-NHL".
I do exactly this, but for the company that I work for.
I'm on the dashboards and integrations team, and I don't have direct access to the codebase of the main product. As the internal APIs have no documentation at all, I'm always "hacking" our own system using the browser inspector to find out how our endpoints work.
There are so many of us doing this, haha. If you work at a big enough company or one with poorly documented APIs, it's just faster to reverse engineer the existing apps/UIs.
Just make sure you don't end up accidentally making one of those APIs semi-official.
I've seen a case where one team developed a temporary hack to automate some process in their product, then allowed another team making a sibling product to use it to test a possible feature; soon after, the possible feature became an actual one, everyone seemingly forgot the API was a throwaway test. Over the years, as both products evolved, that feature got pretty flaky, and in one of many cross-team attempts at fixing it, someone from the original time finally pointed out that the whole thing is still relying on a temporary hack in the original product that was never intended to be productized...
Same here, but I do have access to the codebase. But there's no documentation about all the different APIs. Reading through the code also doesn't always explain what the API does, or what it's for. I work on a separate system that benefits from using these APIs from the main product, there is almost no overlap or communication between engineering departments. It's far from ideal.
> 75grand’s success was even met with jealousy from the college
That's a common story; in my university (in Padova, UniPD) happened something even worse. They tried hard to shut down an unofficial app (Uniweb) that was installed by most of the students in favor of the "official" one, that was completely unusable (and probably was born out of a rigged contract). At the end the best one won and became official, but that was after a lot of struggle.
> They tried hard to shut down an unofficial app (Uniweb) that was installed by most of the students in favor of the "official" one, that was completely unusable
Sounds like what happened with the Apollo app and Reddit
I’m just not sure jealousy is the correct word for this though. Most systems don’t like these kinds of things for a number us reasons.
In my (perhaps limited) experience, most of those reasons are just different ways of spelling "our official solution is shit, because we don't care and/or make money in some underhanded way".
Agreed that I’m not sure jealousy is the right interpretation of imitation being the sincerest form of flattery.
Having said that, speaking as a Macalester alumnus, I wouldn’t put it past them to be a bit petty behind the scenes :)
I think the word usage is tongue in cheek
My high school used a system so bad I stopped caring about my grades, I just didn't want to log into that crap more than once a year. I couldn't believe my eyes when I got into university and saw they were using the exact same shitty system.
Reverse engineered just enough of that .net monstrosity to get the data I cared about. Replacing the school's portal with my own script made me so happy. They kept breaking my scraper constantly though, got to the point it was too much effort to maintain it. Worst part is I don't even think they were doing it on purpose. They were just bad in general and couldn't keep their site together.
I'm curious, what system was it? My school uses Synergy's stuff which sounds similar
Anyone come up with good techniques for reverse engineering websockets?
Its especially annoying since many use binary message formats and there isnt a great way to document an arbitrary binary message protocol.
A couple techniques im trying out:
- websocat and wsrepl for reverse engineering interactively: https://github.com/doyensec/wsrepl
- kaitai struct for documenting the binary message formats: https://kaitai.io/
I'm very interested in this, I do a lot of protocol debugging. Kaitai looks very neat - is that the most popular format for this kind of thing, or are there other popular options I should look at too?
Just lately somebody posted about the Imhex tool's DSL: https://xy2i.blogspot.com/2024/11/using-imhexs-pattern-langu...
I recently wanted to reverse engineer some Websocket packets for a game I was playing. I used BurpSuite as a proxy to bypass the SSL encryption. It also has a pretty handy tool that will monitor all websocket traffic.
After that I used ImHex, pretty much exactly like in that blog to reverse engineer the websocket packets. The DSL is a little finicky but once you wrap your head around it, it`s very nice and powerful.
Thanks for sharing your experience, it is valuable as I am considering using Imhex and its DSL lately.
The problem w/ this is a lot of the time is as they keep ramping up their anti-abuse measures, your old code breaks and needs to be rewritten (especially if it's an API you intend to use for a long term project).
It's just an endless cat & mouse game.
Sure, but all software rots.
I did the same thing long (well, not that long) ago for my school’s system too: https://github.com/saagarjha/break. In retrospect this was an excellent use of time because when I made this I barely knew what JSON was and ended up reimplementing some protocols from scratch (HTTP basic auth, WebDAV) that I learned much later were actually standardized and I had actually known them all along ;) Alas, the service got bought out by some vulture/bodyshop PE firm and the app ended up outliving the servers it was hitting. But until then I got a lot of great email from high school kids telling me how much I improved their experience or asking if they could build their own thing on the APIs they found in my app after trawling GitHub.
I actually enjoy reversing APIs. A game I really liked to play announced EOS last year. I ended up capturing a bunch of mitm traffic and built my own API instead and now head up a private server for it.
I don't dare monetize that project, but I wish I could use my skills to make some extra money at reversing APIs. Wouldn't know where to begin though.
I feel the same way about monetizing my api reverse engineering skills. I taught myself to program by reverse engineering api's, and it's what led me to pursue computer science. I'm pretty good at it, but I can't seem to find a role to flex my skills.
I have the same wish, and wish that for you, but tangentially I wanted to offer that if you were to publish the game server you built, to the very best of my knowledge alternative implementations are protected by US law (assuming that applies to you, of course) and then everyone in a similar boat could sponsor your project, straight up donate to you, or at very least you'll have made the world a better place for the players in a similar situation as you were
The game is Gundam Evolution, so Bandai Namco is the owner. They are known to be very litigious and protective of Gundam IP.
The game was a multiplayer FPS and involved cosmetic micro transactions.
Due to mismanagement they shut down in one year. It launched in late 2022 and EOSed late 2024.
I'm in the US, yes. The company is Japanese though and they may not care as much about US law when choosing to pursue. Even if they ultimately don't have a case, there's still enough gray area to torment-by-lawyer.
For that reason I chose not to take any cash for the project. I also don't distribute any game files. Trying to minimize risk as much as possible.
But yeah! It was a good community building project. We're about to hit 10,000 members I hope it leads to some connections that are profit rearing.
You're a hero. I couldn't care less what laws say, companies that go after people like you deserve to be boycotted mercilessly. I haven't bought a single EA game since they C&D'd the people reverse engineering Battlefield 2 and 2142 network protocols. They just wanted to keep some beloved games alive. I will never forgive them for it.
Whatever you're doing, I hope you get away with it.
I appreciate it. So far so good. We've been hosting lobbies for about about 6 months now.
Also, I made a typo. They shit down in late 2023, not 2024. So between November 2023 and March 2024, there was no public place to play the game we've brought it back since though.
Hope Bamco lets us be in our little corner for now!
As I read through these comments I'm reminded of recon techniques commonly used in bug bounties. Reverse proxies, decompiling APKs, parsing JS files to find endpoints, etc. I often went down rabbit holes in the recon phase and never found bugs but I had fun in the process so I considered it a win anyway.
I also implemented a bot mitigation system for a large international company so I got to see techniques used from the other side. Mobile phone farms and traffic from China was the most difficult to mitigate.
Why was China in particular difficult to mitigate?
Sick app btw. Funny this comes up because I'm working on the exact same thing for my school. Note that if your school uses Canvas; Canvas' API is well documented and has GraphQL endpoints.
Canvas, and several other LMS's all use scorm cloud on the backend. Discovered this while researching for something work related.
Getting into the internals of an LMS is tricky, but surprisingly very interesting and very fun.
Someone did this to the Ring mobile app to get a dataset of all (public) Ring cameras across the US
https://gizmodo.com/ring-s-hidden-data-let-us-map-amazons-sp...
I wonder how difficult it would be to combine many of these techniques into some automated script that dumps a manifest of the different types of undocumented APIs there are. LLMs have also been shown to be pretty good at answering semantic questions about large blobs of minified code, perhaps there could be some success there?
There's too many fiddly bits of which endpoints return what data in which shape that requires a custom solution each time. An LLM would probably have a very easy shot at this problem though.
It's much more straightforward if you can find a GraphQL, Swagger, or OpenAPI spec to automate conversion I'd imagine.
Launch HN: Integuru – Reverse-engineer internal APIs using LLMs
> The error messages helpfully suggested fields I hadn’t known about by “correcting” my typos.
Glad to see this being called out. Sure, I get why it's convenient. Misspelling a field by one character is a daily occurrence ("activty" and "heirarchy" are my regulars). The catch is that spellchecking queries and returning valid fields in the error effectively reduces entropy by both character space and message length, varying by the type of distance used in the spellcheck.
It's really fascinating watching the constant push and pull in projects like yt-dlp and places like YouTube and TikTok. So many interesting techniques to make things more difficult for reverse engineering or making unofficial requests. There is even a tiny JavaScript engine iirc that calculates a special value that YouTube uses to verify requests.
Companies should take the existence of things like this as a signal that something is wrong with their product. There's a lot of things wrong with YouTube these days. Their ad engine is just out of control, often sticking ads at both the beginning and the end of short videos, sticking ads in the middle of songs on music performance videos and so on. Amazingly, they've managed to make it worse than TV.
This logic only applies for companies in a competitive business environment.
YouTube is a video monopolist - they know that it is an awful service for viewers, but there's no need to improve things because where else would viewers go?
Unfortunately, they are also part of a conglomerate that has reached full dominance in several online markets that are not growing. And they need to show quarter-on-quarter growth to shareholders. So you can't capture more market share, and there aren't any more people discovering the Web, so you stick additional ads everywhere. These people know exactly what they are doing.
Most of these techniques are extremely old and very outdated.
Teams that I've seen working on apps now implement much stronger checks on APIs especially Android apps such as SafetyCheck and DeviceCheck and other methods, which makes using strings rather basic to see them.
And most apps are now encrypted so you just see junk in the logs.
And on the web side, fingerprinting is rampant and there are JS challenges in cloudflare, imperva, etc which make it trickier. Frustrating to run a whole browser with a virtual screen, load the whole page which is ofc like 15mb of JS and other trash, just to do a very simple thing.
Granted, smaller fish like the ones OP is referring to generally don't have aggressive anti automation measures in place, so it can be easy...but generally these techniques don't work if the operator has put the proper measures in place.
take a look at https://xhr.dev/, a product I built to avoid bot detection from things like cloudflare, imperva, aws waf, and others
What does the $500 a month get me? Infinite resources to scrape all of LinkedIn?
>self host (Docker): $60k/yr
lmao ok
Frustrating? Yeah! but it works SO great! I especially like Playwright in this context, it can do pretty much anything and is a joy to use.
Is there a drop-in "thing" for using DeviceCheck? I would guess that something like Auth0 uses it (or maybe not? [0]). It seems like this could be a feature in any API Gateway / WAF'y product?
Not that I'm hoping for it, I too like to play around like OP. But I'm surprised how little I've encountered it in the wild.
[0]: https://github.com/search?q=org%3Aauth0%20DCDevice&type=code
I think good APIs are one of te most important and least obvious advantages of SPAs.
A technique not covered here is setting up a wifi hotspot to impersonate a device. I wanted to reverse engineer the protocol used by an inexpensive action camera. I ran the Android app through a decompiler (JADX), and the app supported many different types of cameras, so it wasn't clear which calls were related to the one I owned.
So I setup a Raspberry Pi with the SSID as the camera, logged the first API call. Then I ran that API against the camera, and then wrote a small python script on the Pi to return the same API result. I did that call by call until I'd worked everything out. I wrote it up here: https://www.hotelexistence.ca/reverse-engineer-akaso-ek7000/
bund.dev is a group of volunteers doing exactly that for APIs run by the German government. I met Lilith at a meetup recently. Her talk was super interesting.
I one time talk with a guy running a scraping company. A client of his wanted all products from all of his suppliers in a single interface. One really slow website was constantly changing things specifically to make scraping impossible. He had to fix it every few hours then they changed everything again.
I gave him the perfect solution: Why don't you just call them and ask how much money they want for access to their product catalog? Give them tiny bits of information if they don't approve immediately. Tell them it is for one of their customers and that their site is to slow. They are the only supplier not in the interface which is bad for them and bad for you. If they still refuse offer them access to their competitors data at a modest fee.
He couldn't stop laughing, he never considered it. When the sun set the next morning he put in the call. They gave him access immediately, they were so happy to finally get rid of his crawler messing up their analytics data. Apparently the people on the other end of the tube also didn't sleep. They had a good laugh about it.
> Mobile apps have no choice but to use HTTP APIs. You can easily download a lot of iOS apps through the Mac App Store, then run strings on their bundles to look for endpoints.
Are there any good tutorials for that? 'strings' is not the greatest name for searching good information.
Just to clarify a bit more for those who are new to strings and because the audience for the post may learn towards people fresher to reverse engineering:
While most of the time, you're dealing with variables and such in programs, at some point you have to hardcode some information such as URLs to query so something like
BASE_URL = "https://example.com" result = requests.get(BASE_URL + "/api/blah"
If we pretend this is in an Android app which is stored as an apk file (a zip file basically), running strings would spit out "https://example.com" and "/api/blah"
It'll also spit out anything that appears to be an ASCII character so plenty of junk but it's often quite handy as a starting point.
There are, of course, much more precise tools such as man in the middle proxying but that you'll only capture traffic for endpoints actually used by said app. The app may contain other endpoints let unused, rarely triggered and so on.
`strings` is the Unix command line utility[0] of the same name.
strings file
will tell you all of the ASCII strings in file. strings -el
For utf-16 encoded ascii strings (very useful when dealing with windows executables).I was about to post the exact thing, I dont see any thing API related to iOS apps on Google.
Edit: After some Google-ing and trying it on my macbook, there is a native CLI tool called "strings". Supposedly it does the following: strings is primarily used to find and display printable character sequences in binary files, object files, and executables. Which means the author is probably looking at the app to see the hardcorded characters in the app binary(?) and searching for the API end points.
Just for context, strings is super commonly used when reverse-engineering anything. It's a great first-step because it's easy, fast, and get's some decent clues to help you get your bearings in an unknown binary file.
Among 'nm' and 'binwalk e $FILE'.
"strings" is a Unix CLI utility that automates the equivalent of a tried and true practice on Windows: opening an executable file in Notepad.exe and scrolling around until you find human-readable text (usually near the end of the file).
If you want to RE some android apps HTTP or even HTTPS, it's more straight-forward to use httptoolkit and frida. Might be a bit rough the first time you do it, but once its set up its a breeze. You can intercept calls and even modify etc
https://httptoolkit.com/blog/frida-certificate-pinning/
https://github.com/httptoolkit/frida-interception-and-unpinn...
This is actually now built into HTTP Toolkit, so it's easier than it sounds - if you connect a rooted device, there's an "Android app with Frida" interception option that installs Frida and runs the scripts above for you against any given app on the device automatically. Funded by the EU! https://nlnet.nl/project/AppInterception/
Not sure what "strings" is, but I always use Charles Proxy to inspect traffic for any mobile app: https://apps.apple.com/us/app/charles-proxy/id1134218562
gnu strings: (first google result from "gnu strings" search) https://sourceware.org/binutils/docs/binutils/strings.html
strings is a unix program that shows you strings in a binary file
I googled a bit and found this, which points to some other tools.
https://www.corellium.com/blog/ios-mobile-reverse-engineerin...
(No affiliation.)
I had also sometimes done, but usually because the existing UI isn't very good (and sometimes it doesn't work at all).
Did this with Redlib to allow access to Reddit, and have been building tooling to allow academic researchers to access data at a large scale at no cost :)
I’ve built the leading satay transparency tool in my country and one other, reverse engineering an API. Took me a few hours, now 6000 people use the tool. I think my tool is responsible for their changes to rate limiting lol
Anyone have this for Twitter? I want to remove most of my tweets but the official API costs $200
Maybe reverse it from the web app.
I deleted a tweet and saw this request:
HTTP POST https://x.com/i/api/graphql/VstuveVgh5q5jk7lmnVopqr/DeleteTweet
{
"variables": {
"tweet_id":"12344567899123",
"dark_request":false
},
"queryId":"VstuveVgh5q5jk7lmnVopqr"
}
You can execute these from javascript in the browser if the auth part is too complicated.### Update, this is the pure javascript console way, if you don't want to write your own client doing HTTP posts
I played with the console more and got these parts:
// Find all tweets on screen (this gives you the tweet IDs too)
document.querySelectorAll('a > time')
// Click the "more" button on the first tweet document.querySelectorAll('a > time')[0].parentElement.parentElement.parentElement.parentElement.parentElement.parentElement.parentElement.parentElement.parentElement.parentElement.parentElement.querySelector('button').click()
// Click delete on the tweet document.querySelectorAll('[data-testid="Dropdown"]')[0].children[0].click()
// Confirm delete document.querySelectorAll('[data-testid="confirmationSheetConfirm"]')[0].click()
Shortly before they destroyed the API I made a little script to delete all my tweets. Was really really handy
I did this by writing a script in the console of twitter.com that walked all my tweets and deleted them one by one. Nothing fancy needed.
I don't have exactly what you're looking for, but I built a personal RSS-feed generator for twitter accounts that runs Selenium, and scans the DOM for data - using vxtwitter to fetch things in an easier way.
You could do something similar; as someone else suggested, just walk the feed via DOM elements.
You can request a GDPR deletion if you are in Europe? I don't know how Twitter actually conform to this regulation if you are outside of the EU, or how they would even know you are outside of the EU.
Their delete post endpoints probably require auth. What’s to stop you from deleting someone else’s posts
…you can see what auth/headers are passed to the endpoint from said developer tools.
I agree with sibling commenters that automating the browser with DOM APIs is an easier route to go though.
Macalester ‘07 here. Back in my day their LDAP directory was public (at least on the campus network) which I used to scrape student & professor lists.
I’ve reverse engineered a few industry conference apps to more easily get the list of attendees (some conferences will literally only give you a pdf of scanned paper lists of contact info for attendees, which is insane, especially if you are paying to have a booth there). I’ve either decompiled the Android app, or ran mitmproxy on the device, or both, to figure it out. Does anyone have a recommendation for having a pre-rooted Android simulator with the tools you need installed and ready to go to make this a quicker process? I’d love to just drag and drop an apk into a simulator and start inspecting vs having to use a real device and all that jazz.
I found this recently, but I haven't tried it yet.
that is a cool find :)
Very interesting read! Gives some idea on how to integrate data that are not easy to access. Thanks
Can someone recommend a long form guide that covers best practices for handling authentication and the like?
Is this what you’re looking for:
https://thecopenhagenbook.com/
It was shared last or week or so on here.
If you want to do this and get paid for it come talk to us at Terminal49
Yes, I would like to do that, RE and network APIs were always a passion.
What's the best channel to reach you? I'll email you if you are not seeing this comment.
Reverse engineer their website CMS and find the admin email.
I got downvoted for not taking the hint, I suppose.
There’s probably a group of poor bastards on call at 5am, poring over alerts, and logs in Datadog etc wondering WTF is wrong with their applications.
Give ‘em a shout out by saying “hi” in a request parameter or something while you’re reverse engineering.
W
I went to the 75grand app listed in the article and saw a listing for Cafe Mac and did a double take. Apple's employee cafe is caffe Macs, so I was quite confused for a second
This approach is generally seen as unwanted by website owners (it's worth noting that automated API clients are distinct from regular user agents). As a “reverse engineer”, you have no idea how expensive or not an endpoint is to process a request.
Instead, I'd recommend reaching out to the website owners directly to discuss your API needs - they're often interested in hearing about potential integrations and use cases.
If you don't receive a response, proceeding with unauthorized API usage is mostly abusive and poor internet citizenship.
In my personal view, this seems a little overbearing.
If you expose an API, and you want to tell a user that they are "unauthorized" to use it, it should return a 401 status code so that the caller knows they're unauthorized.
If you can't do that because their traffic looks like normal usage of the API by your web app, then I question why their usage is problematic for you.
At the end of the day, you don't get to control what 'browser' the user uses to interact with your service. Sure, it might be Chrome, but it just as easily might be Firefox, or Lynx, or something the user built from scratch, or someone manually typing out HTTP requests in netcat, or, in this case, someone building a custom client for your specific service.
If you host a web server, it's on you to remember that and design accordingly, not on the user to limit how they use your service.
That's like saying if someone accepts cash that means you should be allowed to pay a $100 bill with a thousand dimes.
Just because you're right doesn't mean you aren't wrong.
The $100 tab paid in dimes causes severe inconvenience to the person trying to count them and to the person who has to take them to the bank to cash them in and wait for them to be counted again.
Their very reasonable question was: if you can't distinguish the reverse engineered traffic from the traffic through your own app in order to block it, then what harm is the traffic doing? Presumably it's flying under your rate limits, and the traffic has a valid session token from a real customer. If you're unable to single it out and return a 4xx, why does it matter where it's coming from?
I can think of a few reasons it might, but I'm not particularly sympathetic to them. They generally boil down to "I won't be able to use my app to manipulate the user into taking actions they'd otherwise not take."
I'd be interested to hear if there are better reasons.
"if you can't distinguish the reverse engineered traffic from the traffic through your own app in order to block it, then what harm is the traffic doing?"
If you really believe this you'll use a custom user agent instead of spoofing Chrome. :-)
Some websites use HTTP referer to block traffic. Ask yourself if any reverse engineer would be stopped by what is obviously the website telling you not to access an endpoint.
I'll add that end users don't have complete information about the website. They can't know how many resources a website has to deal to reverse engineering (webmasters can't just play cat and mouse with you just because you're wasting their money) nor do they know the cost of an endpoint. I mean, most tech inclined use ad blockers when it's obvious 90% of the websites pay the cost of their endpoints by showing ads, so I doubt they would respect anything more subtle than that.
If an endpoint costs a lot to run, implement rate limits and return 429 status codes so callers know that they're calling too often.
That endpoint will be expensive regardless of whether it's your own app or a third party that's calling it too often, so design it with that in mind.
Your app isn't special, it's just another client. Treat it that way.
The only reason why "another client" can exist is due to limitations of the Internet itself.
If you could ensure that the web server can only be accessed by your client, you would do that, but there is no way to do this that can't be reverse-engineered.
Essentially your argument is that just because a door is open that means you're allowed to enter inside, and I don't believe that makes any sense.
The argument is that what you call "limitations of the Internet itself" is actually a feature, and an intended one at that. The state of things you're proposing is socially undesirable (and in many cases, anticompetitive). It's hard to extend analogies past this point, because the vision you're describing flies in the face of more fundamental social norms, and history of civilization in general.
It's not a limitation of the internet, it's a fundamental property of communication.
Imagine trying to validate that all letters sent to your company are written by special company-provided typewriters and you would run into the same fundamental limits.
Whenever you design any client/server architecture, the first rule should always be "never trust the client," for that very reason.
Rather than trying to work around that rule, put your effort into ensuring that the system is correct and resilient even in the face of malicious clients.
> If you really believe this you'll use a custom user agent instead of spoofing Chrome. :-)
Read up on the history of User Agent string, and why everyone claims they're Mozilla and "like Gecko". Yes, it's because of all the silly people who, since earliest days of the WWW, tried to change what they serve based on the contents of User-Agent header.
Not the greatest example. If someone has incurred a $100 debt to you, then, from a legal perspective, you must consider delivery of a thousand dimes as having paid the debt. You don't get a choice on that without prior contractual agreement.
https://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim...
(In the United States at least)
I think it's the greatest example because it's something you're technically allowed to do but that you obviously shouldn't do because you're wasting other people's resources.
This is not an accurate reading of the code. Snopes quotes an FAQ on the US Treasury site (now missing, but presumably still correct) [0]:
> Q: I thought that United States currency was legal tender for all debts. Some businesses or governmental agencies say that they will only accept checks, money orders or credit cards as payment, and others will only accept currency notes in denominations of $20 or smaller. Isn't this illegal?
> A: The pertinent portion of law that applies to your question is the Coinage Act of 1965, specifically Section 31 U.S.C. 5103, entitled "Legal tender," which states: "United States coins and currency (including Federal reserve notes and circulating notes of Federal reserve banks and national banks) are legal tender for all debts, public charges, taxes, and dues."
> This statute means that all United States money as identified above are a valid and legal offer of payment for debts when tendered to a creditor. There is, however, no Federal statute mandating that a private business, a person or an organization must accept currency or coins as for payment for goods and/or services. Private businesses are free to develop their own policies on whether or not to accept cash unless there is a State law which says otherwise. For example, a bus line may prohibit payment of fares in pennies or dollar bills. In addition, movie theaters, convenience stores and gas stations may refuse to accept large denomination currency (usually notes above $20) as a matter of policy.
I specifically said "incurred a ... debt" and "without prior... agreement". As your source says
> In short, when a debt has been incurred by one party to another, and the parties have agreed that cash is to be the medium of exchange, then legal tender must be accepted if it is proffered in satisfaction of that debt.
You are correct that if cash is not accepted at all, or if payment is to happen ahead of the exchange of goods or services, you are not obligated to accept arbitrary cash.
And I never claimed otherwise
No, your claim is backwards—if the parties have agreed that dimes are valid payment of debt then that agreement must be upheld. Absent a prior agreement to accept dimes, the party receiving the money may refuse any combination of currency that they see fit.
In other words, an agreement isn't required in order to refuse legal tender, an agreement would be required to make it mandatory.
A court might decide that an agreement to accept cash without specifying in what form was meant to include dimes, but I see no evidence anywhere that a court has to rule that way if the contextual evidence suggests something else was probably meant.
"legal tender" is a term of art that specifically means a creditor must accept it, and your big quote clearly supported this, discussing only the common misconception that legal tender means it can't be refused in an offer to purchase. You are arguing backwards yourself.
No you're still wrong.
The law says that coinage is valid legal tender for an offer to settle a debt but the counterparty is not required to accept it... unless they contractually agreed to do so.
Only the U.S. government is required to accept payment in coins. Many states also require their agencies to accept payment in coinage but some have laws limiting the size of debts that can be paid this way.
That link is, again, about the difference between offers to purchase and offers to settle debts.
Think logically about this. What do you think legal tender even means otherwise? Why would you need a special term to denote a form of payment that a creditor can accept if they want to? I could accept settlement in jelly beans if I wanted to. The entire point is that you must accept legal tender, that is what makes it different from everything else.
Not the greatest example. If someone has incurred a $100 debt to you, then, from a legal perspective, you must consider delivery of a thousand dimes as having paid the debt. You don't get a choice on that without prior contractual agreement.
https://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim...
I think it can be done responsibly. If you're not imposing any more traffic on them than you would be visiting their site manually, then using their APIs to get at structured data is actually win-win compared to the alternative (load the whole thing and scrape).
Where I'll agree with you is cases where people do this and impose way more traffic than is typical and often more than is necessary (i.e. no caching). But that's not really specific to reverse engineering apis, that's just about being a good internet citizen in general.
I'm of the opinion that a user agent is a user agent, and website owners shouldn't pick and choose what user agents they support. Target the behaviors that affect your infrastructure, not the means of access and algorithms used to process what you send.
> If you're not imposing any more traffic on them than you would be visiting their site manually, then using their APIs to get at structured data is actually win-win compared to the alternative (load the whole thing and scrape).
I agree but that's not usually how it goes. From what I've seen, it's mostly very poorly written scripts with not rate limiting and no backoff strategy that will be hitting your api servers.
Would you rather have those poorly-written scripts hitting your APIs or have a poorly-written puppeteer script loading every asset you have—hitting the APIs along with everything else?
Casting shade on API reverse engineering when what you actually have is a failure to rate limit is throwing the baby out with the bathwater. Abusive users will abuse until you build in a technological method to stop them, and user-agent sniffing provably doesn't work to stop bad actors.
The concept of a flexible, customizable User Agent that operates on my behalf is a key idea that's foundational to the web, and I'm not willing to cede that cultural ground in the vague hope that we can make the bad guys feel bad and start just using Chrome like civilized people.
This is a weird attitude. The Internet we used to know meant that you could do things with it. Certainly you should not reverse engineer so as to get access to others accounts in the first place, but that should be impossible anyway.
I am aware that lots of companies have ideas(TM) about how you should be able to use their products(TM) and may even add these to their Terms of Service, a document that has somehow become the last refuge for the bureaucratic organisation desperate to maintain control when forced to connect things to great unbureaucratic internet.
To that, I say: too bad. I never signed up for the new version of the internet and I do not consider TOS to be anything but noise. I used Pidgin back in the day and would again if it worked.
This absurd idea that website owners should have any say about what runs on your computer/device is nonsense.
Well, during the first days of the Pandemic I discovered that theathletic was using the same hardcoded API key for EVERY SINGLE ACCOUNT on their app. Granted, sports news when there were no sports and the near absence of interactive elements made it pretty meaningless, except you could impersonate any user to leave comments. So, bizarrely, sometimes you can reverse engineer right into other people's accounts, I just am not quite sure how the devs (I think, looking at the comments in the code, that they were Czech) managed to get the gig, considering how much the site was able to gather talent and create great content in spite of the paywall, and was sold to the NY Times for quite a bit of cash ($550 million). A $550 million app should not be using a hardcoded key in production.
The Times is really not a great tech company in any sense. If I were a bit less lazy/busy I'd get more into their audio app, but frankly their reporting has gone downhill. I guess they're running the referral mill strategy now with all the ads they put into the app where there were none. Maybe they can hire some better programmers, or better reporters, for that matter.
>This absurd idea that website owners should have any say about what runs on your computer/device is nonsense.
No, they don't get a say about what software you run on your computer. But if your computer is accessing private APIs that I pay for, then I get a say in how you get to use it. It's also up to me to secure the APIs and prevent abuse. If I don't do that then you're essentially free to do what you like with the API until such time that I do lock it down. I'm also free to block your IP address and delete your account if you break the rules of use of the API that I am paying for. Don't like it? Too bad. You can pay for infrastructure to run your own damn APIs.
For public APIs, the same rules about public usage of any physical space should apply. If you can see it "from public" aka logged-out, then you can take photos or record it (aka access the API). If it's a restricted area, then the public isn't allowed there and it's up to the entity trying to protect it to secure it.
I make my living for the last 7 years reverse-engineering non-public APIs from a service my company pays for. The service gets to set a rate-limit, and they enforce it. They know what we're doing and we are in contact often with their managers and engineers. They let us know if we're straining their systems and we respond by limiting use of some of their more expensive APIs. We've almost DDOS their system before, and this is a system millions of people subscribe to, that serves billions of pages per day. It's in everyone's best interest to get along and not abuse the APIs, and not cut us off from using them in a different way than they intended.
I would love it if this service took developers seriously and actually had a real developer program, but they do not, and they likely never will. It's more geared to consumers. But we depend on them in a very big way, so my job is reverse-engineering and scale up something that was never meant to be scaled. It's interesting work, but it also requires having an adult attitude and playing nicely with others. A little mutual respect can go a long way.
Nah.
Bots looking for exploits is rude, spamming an endpoint with more traffic than normal is rude.. but a human trying to figure out the API that you exposed to the internet? That's just fair play.
Also, better to ask for forgiveness than to ask for permission. The author is adding value to the world while hurting nobody, and the answer would likely be an automatic "no" anyway.
What’s wanted (or not) by website operators is irrelevant.
Adversarial interoperability is a cornerstone of the value of the internet and something we should fight hard to keep.
This would be akin to asking a shop owner if I'm allowed to pick a panflet from an endless stack of panflets placed on the sidewalk. If they don't want the public to pick panflets, don't put them where the public can reach'em!
Same could be said about things in your front yard?
I’m not sure taking liberties with things you don’t own, is always the best policy, nor is putting the entire responsibility on the owner.
I don’t think this is something you can boil down to a simple black and white.
Accessing a publicly available web service is not “taking liberties with things you don’t own”.
If you put a server on the public internet and I send it a message (assuming I’m not using ill-gotten credentials, etc.), anything it responds with is your problem, not mine.
> Same could be said about things in your front yard?
No. It could be said, but wouldn't be true - objects in a domestic front yard are nothing like pamphlets placed on the sidewalk.
Things on sidewalks are not free to take because you can touch them. Just like things in your front yard are not. Just because you can get to it, doesn’t mean it’s yours to take and act upon at will.
Not sure if you are deliberately missing the point. The convoluted example was more along the lines of "some company WANTS you to take pamphlets sitting on the ground in front of their store... but only take them while bending over and not while squatting". Also sidewalks are technically public property, while your front yard is a private property.
Not sure why you are acting so obtuse.
Pamphlets are by convention. And even if they weren't, they could be removed on the basis of being on public property anyway.
I agree. When scraping my school's portal which uses Canvas, my school actually allows it.
Sometimes you can get the green light just by reading API docs/School's privacy policy as they're usually obliged to have one (ofc this primarily applies to school APIs like in OP's article)
It's always fascinating to discover a comment on Hacker News arguing against using network services in any way without the written permission of their owners and operators. We'd have a very different internet today if people had been following this advice for the last 30 years. A much worse internet, to be clear.
A company can always document their API if they want power users to be informed about the company's preferred ways of doing things. They generally don't because they want to create a bunch of market friction so you're forced into using their proprietary client apps to interact with their services. I for one applaud efforts like the original post, and think it's an instance of outstanding Internet citizenship, where citizenship means working to preserve Internet societal norms like decentralization.
Serving ads is mostly abusive and poor internet citizenship but here we are.
Unspoken rule is that maintaining 3 seconds delay between requests
This is such a cute take.