Congrats! What hardware you use to run the inference 24/7? I built a simpler version for running on low end hardware [0] for recognizing if there’s a person on my parcel, so I know someone have trespassed and I can launch siren, lights etc.
Hello from the privacy crowd! Please use this responsibly. Tech can be a lot of fun and I encourage you to play around with things and I appreciate it when you push the boundaries of what is technically feasible. But please be mindful that surveillance tech can also be used to oppress people and infringe on their freedoms. Use tech for good!
This runs with a Geforce GTX 1060. By a quick search it's 120 W. Maybe it's only the peak power consumption but it's still a lot. Do commercial products, if there are any, consume that much power?
I have something similar. It's not tracking though. Drawing around 10W on a pi, around 7W on a Jetson.
not sure if i'm misunderstanding - you've got a similar GPU to a 1060 hooked up to a pi?
OP is probably using an AI accelerator like this: https://coral.ai/products/accelerator which works great on a PI and uses very little power. It will do the Yolo part, but you can't really expect it to do the multimodal LLM part, although you could try to run Florence directly on the PI too.
A suggestion: I'd swap llava for Florence-2 for your open set text description. Florence-2 seems uniformly more descriptive in its outputs.
I found grounding-dino better than Florence and faster
Can you specify ideal hardware (camera, computer) to deploy the solution? Thanks
Here are hardware recommendations from another similar (and well established) project: [1] [2]. Even though they don't recommend Reolink cameras, I have both Amcrest and Reolink cameras working well with Frigate for more than a year now.
You'll want to find an IP Camera that supports the RTSP protocol, which is most of them.
If your budget supports commercial style or commercial grade cameras, looking at Dahua or Hikvision manufactured cameras would be a good starting point to get an idea of specs, features, and cost.
Maybe don’t buy surveillance hardware from those brands
Not OP, but the reason may be:
US - FCC Ban The US Federal Communications Commission (FCC) banned Dahua and Hikvision from new equipment authorizations in November 2022. Most products that use electricity require FCC equipment authorizations; otherwise, they are illegal to import, sell, market, or use, even for private individuals. Jul 5, 2024
Shame, they are the best cameras available.
A lot of the commercial-style or commercial-grade IP Cameras sold are rebadged Dahua or Hikvision products.
Compromised firmware or other backdoors are a concern for a wide range of products. With IP Cameras, a commonly recommended practice includes putting them on a non-internet accessible network, disabling any remote access, UPnP type features, etc. You can run IP cameras in an air-gapped configuration as well.
Home/consumer-grade cameras have plenty of shortcomings too.
If they are rebadged, that's fine :)
You're going to have to explain the reasoning here
”Analysts noticed that CCTV cameras in Taiwan and South Korea were digitally talking to crucial parts of the Indian power grid – for no apparent reason. On closer investigation, the strange conversation was the deliberately indirect route by which Chinese spies were interacting with malware they had previously buried deep inside the Indian power grid.”
link? i am close to CCTV retailers and dahua and hikvision are only brands of CCTV widely available with two exceptions of "cp plus" and "hawkvision" which are in all lilkelihood rebranded or made in china products.
https://www.amazon.in/s?k=cctv+system+4+channel
so what are your options? i have been contemplating getting a door phone + cctv for my home for the past so many years but problems like these prevent me from investing into an ecosystem.
edit: oh. looks like pager attacks has their attention now.
https://trak.in/stories/pager-bombs-govt-can-ban-chinese-cct...
i guess time will tell and then there is lobbying so yeah
Could you elaborate? What’s up with those brands?
Default yolo models are stuck at 640x640, so literally any camera that is at least capable of that resolution. Llava I believe is about the same. You'd need ubuntu and something that can run a llava model in vaguely real time, so a 4090/4080.
how about llama3.2 vision? should it get better performance?
All I see, usually, is some AI YOLO algorithm applied to an offline video.
This is the first time that I've seen a "complete" setup. Any info to learn more on applying YOLO and similar models to real time streams (whatever the format)?
Just stream it one frame at a time to the model and eat the latency: https://www.youtube.com/watch?v=IHbJcOex6dk if you need more hand holding.
There's a reason why there's a whole family of models from tiny to huge.
If you do it naively your video frames will buffer waiting to be consumed causing a memory leak and eventual crash (or quick crash if you’re running on a device with constrained resources).
You really need to have a thread consuming the frames and feeding them to a worker that can run on its own clock.
That's not how loop devices work on Linux.
We’ve got an open source pipeline as part of inference[1] that handles the nuances (multithreading, batching, syncing, reconnecting) of running multiple real time streams (pass in an array of RTSP urls) for CV models like YOLO: https://blog.roboflow.com/vision-models-multiple-streams/
This repository seems to be exactly what you are asking for. It's YOLO analysis of video frames passed in through Real Time Streaming Protocol.