Comments Page - Video Surveillance with YOLO+llava

« Back Video Surveillance with YOLO+llavagithub.comSubmitted by psychip 10 hours ago

yu3zhou4 4 hours ago
Congrats! What hardware you use to run the inference 24/7? I built a simpler version for running on low end hardware [0] for recognizing if there’s a person on my parcel, so I know someone have trespassed and I can launch siren, lights etc.
https://github.com/jmaczan/yolov3-tiny-openvino
vaylian 41 minutes ago
Hello from the privacy crowd! Please use this responsibly. Tech can be a lot of fun and I encourage you to play around with things and I appreciate it when you push the boundaries of what is technically feasible. But please be mindful that surveillance tech can also be used to oppress people and infringe on their freedoms. Use tech for good!
pmontra 4 hours ago
This runs with a Geforce GTX 1060. By a quick search it's 120 W. Maybe it's only the peak power consumption but it's still a lot. Do commercial products, if there are any, consume that much power?
- hcfman 3 hours ago
  I have something similar. It's not tracking though. Drawing around 10W on a pi, around 7W on a Jetson.
  4ggr0 an hour ago
  not sure if i'm misunderstanding - you've got a similar GPU to a 1060 hooked up to a pi?
  lelag an hour ago
  OP is probably using an AI accelerator like this: https://coral.ai/products/accelerator which works great on a PI and uses very little power. It will do the Yolo part, but you can't really expect it to do the multimodal LLM part, although you could try to run Florence directly on the PI too.
rocauc 5 hours ago
A suggestion: I'd swap llava for Florence-2 for your open set text description. Florence-2 seems uniformly more descriptive in its outputs.
- jerpint 2 hours ago
  I found grounding-dino better than Florence and faster
ferar 6 hours ago
Can you specify ideal hardware (camera, computer) to deploy the solution? Thanks
- skirmish 4 hours ago
  Here are hardware recommendations from another similar (and well established) project: [1] [2]. Even though they don't recommend Reolink cameras, I have both Amcrest and Reolink cameras working well with Frigate for more than a year now.
  [1] https://docs.frigate.video/frigate/hardware
  [2] https://github.com/blakeblackshear/frigate
- moandcompany 6 hours ago
  You'll want to find an IP Camera that supports the RTSP protocol, which is most of them.
  If your budget supports commercial style or commercial grade cameras, looking at Dahua or Hikvision manufactured cameras would be a good starting point to get an idea of specs, features, and cost.
  meow_catrix 6 hours ago
  Maybe don’t buy surveillance hardware from those brands
  sinuhe69 6 hours ago
  Not OP, but the reason may be:
  US - FCC Ban The US Federal Communications Commission (FCC) banned Dahua and Hikvision from new equipment authorizations in November 2022. Most products that use electricity require FCC equipment authorizations; otherwise, they are illegal to import, sell, market, or use, even for private individuals. Jul 5, 2024
  hcfman 3 hours ago
  Shame, they are the best cameras available.
  moandcompany 5 hours ago
  A lot of the commercial-style or commercial-grade IP Cameras sold are rebadged Dahua or Hikvision products.
  Compromised firmware or other backdoors are a concern for a wide range of products. With IP Cameras, a commonly recommended practice includes putting them on a non-internet accessible network, disabling any remote access, UPnP type features, etc. You can run IP cameras in an air-gapped configuration as well.
  Home/consumer-grade cameras have plenty of shortcomings too.
  hcfman 3 hours ago
  If they are rebadged, that's fine :)
  avh02 6 hours ago
  You're going to have to explain the reasoning here
  meow_catrix 6 hours ago
  ”Analysts noticed that CCTV cameras in Taiwan and South Korea were digitally talking to crucial parts of the Indian power grid – for no apparent reason. On closer investigation, the strange conversation was the deliberately indirect route by which Chinese spies were interacting with malware they had previously buried deep inside the Indian power grid.”
  2Gkashmiri 5 hours ago
  link? i am close to CCTV retailers and dahua and hikvision are only brands of CCTV widely available with two exceptions of "cp plus" and "hawkvision" which are in all lilkelihood rebranded or made in china products.
  https://www.amazon.in/s?k=cctv+system+4+channel
  so what are your options? i have been contemplating getting a door phone + cctv for my home for the past so many years but problems like these prevent me from investing into an ecosystem.
  edit: oh. looks like pager attacks has their attention now.
  https://trak.in/stories/pager-bombs-govt-can-ban-chinese-cct...
  i guess time will tell and then there is lobbying so yeah
  nativeit 6 hours ago
  Could you elaborate? What’s up with those brands?
- llm_trw 6 hours ago
  Default yolo models are stuck at 640x640, so literally any camera that is at least capable of that resolution. Llava I believe is about the same. You'd need ubuntu and something that can run a llava model in vaguely real time, so a 4090/4080.
nikolayasdf123 4 hours ago
how about llama3.2 vision? should it get better performance?
_giorgio_ 8 hours ago
All I see, usually, is some AI YOLO algorithm applied to an offline video.
This is the first time that I've seen a "complete" setup. Any info to learn more on applying YOLO and similar models to real time streams (whatever the format)?
- llm_trw 6 hours ago
  Just stream it one frame at a time to the model and eat the latency: https://www.youtube.com/watch?v=IHbJcOex6dk if you need more hand holding.
  There's a reason why there's a whole family of models from tiny to huge.
  yeldarb 6 hours ago
  If you do it naively your video frames will buffer waiting to be consumed causing a memory leak and eventual crash (or quick crash if you’re running on a device with constrained resources).
  You really need to have a thread consuming the frames and feeding them to a worker that can run on its own clock.
  llm_trw 3 hours ago
  That's not how loop devices work on Linux.
- yeldarb 6 hours ago
  We’ve got an open source pipeline as part of inference[1] that handles the nuances (multithreading, batching, syncing, reconnecting) of running multiple real time streams (pass in an array of RTSP urls) for CV models like YOLO: https://blog.roboflow.com/vision-models-multiple-streams/
  [1] https://github.com/roboflow/inference
- hug 7 hours ago
  This repository seems to be exactly what you are asking for. It's YOLO analysis of video frames passed in through Real Time Streaming Protocol.