Comments Page - InfluxDB 3 Open Source Now in Public Alpha Under MIT/Apache 2 License

« Back InfluxDB 3 Open Source Now in Public Alpha Under MIT/Apache 2 Licenseinfluxdata.comSubmitted by otoolep 6 months ago

pauldix 6 months ago
Post author, cofounder and creator of InfluxDB here. Happy to answer questions in this thread.
I'm guessing there will be questions about the 72 hour limit. There are two things we're looking at:
First, we're considering giving a free tier for at home and hobbyist usage of Enterprise, which doesn't have this limitation. So this would be kind of like what Tailscale does giving a free usage plan for their commercial software.
Second, for Core, the open source build, we're working on an update that will let it query any 72 hour window of historical data. Right now it doesn't evict data, it all still exists on disk or object storage as Parquet files, but we remove the metadata information from RAM to keep things optimized for the most recent 72 hours.
When the update is done, you'll be able to write and query for any period of time. But an individual query will be limited to a 72 hour time range. This is a service protection mechanism because of how the data is organized.
A file gets created for every 10 minute block of time for each table. So 72 hours is 432 files, which is a lot of GET requests to S3 for a single query. We don't want to increase the range because of that. Multiple queries combining a longer range, or accessing the data from third-party clients is all still possible.
In Enterprise, our commercial product, we have a compactor that collapses these files into larger time blocks that also creates an index that the query engine can use.
Doing it this way was a deliberate choice so that we could have a permissively license open source project separate from the commercial product. If we put the compactor into the open, we'd have to put it under a source available license to limit usage so that we can still sell the database.
Our hope is that there's still an audience of users that will find Core useful on its own, even without any commercial relationship with us. It's not a full historical TSDB, but it's not intended to be. It's meant to be a recent data engine that can collect, process, monitor, ship, and store data paired with a fast analytical query engine against the recent buffer (or recently persisted buffer).
Happy to answer any followup questions about this or the release generally.
- mendocinox 6 months ago
  I haven't used influxdb in a project yet, but I'm a fan of its capabilities!
  The core-enterprise dichotomy seems more or less the same as what scylla had until recently. Does influxdata have different considerations from scylla that will allow influxdb to remain open source in the long term?
  pauldix 6 months ago
  We're open core and have been since 2016. We've deliberately limited the scope of what the open source project is supposed to do. It should be great at this use case of collecting processing, storing, and querying recently buffered data.
  The commercial offering is the historical time series DB along with a bunch of other features around high availability, read replication, fine grained security, and the compaction engine which enables longer range queries and row level deletes.
  I think Scylla had most of their DB in the open and then a small slice of Enterprise functionality (although I'm not super familiar with their product line).
  Ideally, we'd have many open source users and even our commercial customers would use the open source in addition to the commercial offering.
  But ultimately, it's about finding a sustainable business model that keeps more software coming. We have a preference for permissive open source over source available. In my view, we may as well create freemium rather than source available.
  With this version of InfluxDB, we've been able to invest heavily into Apache projects that lie at the core of it: Arrow, DataFusion, Parquet, and the object store crate, which we developed and donated to the ASF.
  We'd like to continue that work because we think that a highly performant, modular, vectorized query engine (i.e. DataFusion) should be a free commodity that's widely available and widely contributed to.
- lacker 6 months ago
  It's a curious way to differentiate between the open source and paid versions, but I guess you have to pick something.
  The 72 hour thing is new with 3.0, right? What were the main differences in the 2.0 version between open source and paid versions?
  pauldix 6 months ago
  2.0 was single server. Our paid offering of that is a usage based cloud platform that’s highly available and managed.
- jwillp 6 months ago
  What is the minimum resident RAM size per individual active unique series? Or what's a typical RSS RAM size for 10 or 100 million unique active series? How does unlimited cardinality avoid RAM exhaustion in this version?
  pauldix 6 months ago
  Core doesn't index the metadata so it uses less RAM for higher cardinality data. However, if you have 100M series and you're writing to all of them at the same time, you're going to need some amount of RAM just to buffer it all up and then ship it off to storage as Parquet. The Enterprise product has a compactor that creates indexes as it goes, but those indexes are lighter weight than those in v1 and v2. Also, users can specify which columns they want to appear in those indexes, so they can leave out high cardinality ones if they want to save on RAM. In v3 you can brute force the query against high cardinality data, unlike v1 & v2, which would eat up a ton of RAM to do so.
- JSTrading 6 months ago
  Excellent! Keep up the great work.