• yu3zhou4 4 hours ago

    Congrats! What hardware you use to run the inference 24/7? I built a simpler version for running on low end hardware [0] for recognizing if there’s a person on my parcel, so I know someone have trespassed and I can launch siren, lights etc.

    https://github.com/jmaczan/yolov3-tiny-openvino

    • vaylian 41 minutes ago

      Hello from the privacy crowd! Please use this responsibly. Tech can be a lot of fun and I encourage you to play around with things and I appreciate it when you push the boundaries of what is technically feasible. But please be mindful that surveillance tech can also be used to oppress people and infringe on their freedoms. Use tech for good!

      • pmontra 4 hours ago

        This runs with a Geforce GTX 1060. By a quick search it's 120 W. Maybe it's only the peak power consumption but it's still a lot. Do commercial products, if there are any, consume that much power?

        • hcfman 3 hours ago

          I have something similar. It's not tracking though. Drawing around 10W on a pi, around 7W on a Jetson.

          • 4ggr0 an hour ago

            not sure if i'm misunderstanding - you've got a similar GPU to a 1060 hooked up to a pi?

            • lelag an hour ago

              OP is probably using an AI accelerator like this: https://coral.ai/products/accelerator which works great on a PI and uses very little power. It will do the Yolo part, but you can't really expect it to do the multimodal LLM part, although you could try to run Florence directly on the PI too.

        • rocauc 5 hours ago

          A suggestion: I'd swap llava for Florence-2 for your open set text description. Florence-2 seems uniformly more descriptive in its outputs.

          • jerpint 2 hours ago

            I found grounding-dino better than Florence and faster

          • ferar 6 hours ago

            Can you specify ideal hardware (camera, computer) to deploy the solution? Thanks

            • skirmish 4 hours ago

              Here are hardware recommendations from another similar (and well established) project: [1] [2]. Even though they don't recommend Reolink cameras, I have both Amcrest and Reolink cameras working well with Frigate for more than a year now.

              [1] https://docs.frigate.video/frigate/hardware

              [2] https://github.com/blakeblackshear/frigate

              • moandcompany 6 hours ago

                You'll want to find an IP Camera that supports the RTSP protocol, which is most of them.

                If your budget supports commercial style or commercial grade cameras, looking at Dahua or Hikvision manufactured cameras would be a good starting point to get an idea of specs, features, and cost.

                • meow_catrix 6 hours ago

                  Maybe don’t buy surveillance hardware from those brands

                  • sinuhe69 6 hours ago

                    Not OP, but the reason may be:

                    US - FCC Ban The US Federal Communications Commission (FCC) banned Dahua and Hikvision from new equipment authorizations in November 2022. Most products that use electricity require FCC equipment authorizations; otherwise, they are illegal to import, sell, market, or use, even for private individuals. Jul 5, 2024

                    • hcfman 3 hours ago

                      Shame, they are the best cameras available.

                    • moandcompany 5 hours ago

                      A lot of the commercial-style or commercial-grade IP Cameras sold are rebadged Dahua or Hikvision products.

                      Compromised firmware or other backdoors are a concern for a wide range of products. With IP Cameras, a commonly recommended practice includes putting them on a non-internet accessible network, disabling any remote access, UPnP type features, etc. You can run IP cameras in an air-gapped configuration as well.

                      Home/consumer-grade cameras have plenty of shortcomings too.

                      • hcfman 3 hours ago

                        If they are rebadged, that's fine :)

                      • avh02 6 hours ago

                        You're going to have to explain the reasoning here

                        • meow_catrix 6 hours ago

                          ”Analysts noticed that CCTV cameras in Taiwan and South Korea were digitally talking to crucial parts of the Indian power grid – for no apparent reason. On closer investigation, the strange conversation was the deliberately indirect route by which Chinese spies were interacting with malware they had previously buried deep inside the Indian power grid.”

                          • 2Gkashmiri 5 hours ago

                            link? i am close to CCTV retailers and dahua and hikvision are only brands of CCTV widely available with two exceptions of "cp plus" and "hawkvision" which are in all lilkelihood rebranded or made in china products.

                            https://www.amazon.in/s?k=cctv+system+4+channel

                            so what are your options? i have been contemplating getting a door phone + cctv for my home for the past so many years but problems like these prevent me from investing into an ecosystem.

                            edit: oh. looks like pager attacks has their attention now.

                            https://trak.in/stories/pager-bombs-govt-can-ban-chinese-cct...

                            i guess time will tell and then there is lobbying so yeah

                        • nativeit 6 hours ago

                          Could you elaborate? What’s up with those brands?

                      • llm_trw 6 hours ago

                        Default yolo models are stuck at 640x640, so literally any camera that is at least capable of that resolution. Llava I believe is about the same. You'd need ubuntu and something that can run a llava model in vaguely real time, so a 4090/4080.

                      • nikolayasdf123 4 hours ago

                        how about llama3.2 vision? should it get better performance?

                        • _giorgio_ 8 hours ago

                          All I see, usually, is some AI YOLO algorithm applied to an offline video.

                          This is the first time that I've seen a "complete" setup. Any info to learn more on applying YOLO and similar models to real time streams (whatever the format)?

                          • llm_trw 6 hours ago

                            Just stream it one frame at a time to the model and eat the latency: https://www.youtube.com/watch?v=IHbJcOex6dk if you need more hand holding.

                            There's a reason why there's a whole family of models from tiny to huge.

                            • yeldarb 6 hours ago

                              If you do it naively your video frames will buffer waiting to be consumed causing a memory leak and eventual crash (or quick crash if you’re running on a device with constrained resources).

                              You really need to have a thread consuming the frames and feeding them to a worker that can run on its own clock.

                              • llm_trw 3 hours ago

                                That's not how loop devices work on Linux.

                            • yeldarb 6 hours ago

                              We’ve got an open source pipeline as part of inference[1] that handles the nuances (multithreading, batching, syncing, reconnecting) of running multiple real time streams (pass in an array of RTSP urls) for CV models like YOLO: https://blog.roboflow.com/vision-models-multiple-streams/

                              [1] https://github.com/roboflow/inference

                              • hug 7 hours ago

                                This repository seems to be exactly what you are asking for. It's YOLO analysis of video frames passed in through Real Time Streaming Protocol.