[5:59pm] notmyname: sdague: cdent: jsut saw you were talking about ceilometer + swift in your meeting
[5:59pm] cdent: yup notmyname: I'm supposed to explore the options with you.
[6:11pm] cdent: notmyname: did you yet get a chance to look at that stuff I pointed out yesterday?
[6:12pm] notmyname: cdent: ya. I gave a brief summary in here yesterday (your night)
[6:12pm] notmyname: let me find it
[6:12pm] cdent: thanks
[6:13pm] notmyname: cdent: https://gist.github.com/notmyname/296ee8f99b36c9a2b0e0
[6:13pm] • cdent clicks
[6:14pm] cdent: okay, yeah, there are multiple not necessarily in the same vein issues
[6:15pm] cdent: one is that there is an ordering problem when the middleware is from ceilometer
[6:15pm] cdent: the other is the way the info is gathered is lame
[6:16pm] cdent: the hope is to kill both with one change
[6:16pm] cdent: the particular meters involved meter numbers of requests and their size
[6:16pm] cdent: does swift push that info out by other means notmyname ?
[6:17pm] notmyname: cdent: number of requests is already pushed out by swift via statsd (along with a _ton_ of other info)
[6:17pm] • cdent locates some code
[6:17pm] notmyname: cdent: http://docs.openstack.org/developer/swift/admin_guide.html#reporting-metrics-to-statsd
[6:19pm] cdent: If I recall correctly there's been concern in the past, notmyname, that that info was insufficiently dimensional. That is it is counts of what happened, but not of who did it? Is that correct?
[6:19pm] notmyname: cdent: correct
[6:20pm] cdent: Then I would guess that's why the middleware was created: in order for the measures to be used for billing "who" has to be in there somewhere.
[6:20pm] notmyname: cdent: the statsd info is something that is good to know the current (and historic) health of the cluster
[6:21pm] notmyname: cdent: and that (the ceilometer middleware) has lead to lots of complaints about slowness and scale issues
[6:22pm] notmyname: cdent: I strongly believe that the sort of info useful for billing (who did something, rather that something happened) should happen outside of the data path. specifically, that info should be gathered from access logs
[6:22pm] notmyname: cdent: pulling it from logs is (1) outside the data path and (2) auditable after the fact
[6:22pm] cdent: Yes, this is why we want your input. It's a known problem that we need to resolve in some fashion.
[6:22pm] cdent: We did determine that there are ways to speed up the middleware a _lot_ but then didn't do the work because of considerations for doing it completely differently.
[6:23pm] cdent: The issue with analysis on historical logs is that at least some people say it is not real time enough
[6:24pm] notmyname: what are the requirements?
[6:25pm] cdent: An excellent question. I think the unfortunate answer is "continue to provide this information to people who are using it already."
[6:25pm] cdent: which is not quite right...
[6:26pm] notmyname: and for an initial argument against that position, you don't need data reporting for billing that is at a higher resolution than you're actually doing billing
[6:26pm] notmyname: are you doing billing on a per-minute basis? as in, you send bills every minute?
[6:27pm] cdent: I'm not in the best position to provide the answers to these questions, simply because I have very little experience with _real_ end users.
[6:27pm] cdent: But my understanding is that beyond billing there is the question of alarms.
[6:27pm] cdent: which have evaluation cycles
[6:28pm] notmyname: "alert if a customer uses more than X"? that sort of thing?
[6:28pm] cdent: yeah
[6:29pm] notmyname: isn't that why we have the support for quotas in the various projects?
[6:29pm] notmyname: (I've long said that useful quotas would be implemented in auth and have input from billing, though)
[6:30pm] cdent: I guess it depends on what the alarms are doing.
[6:30pm] • cdent doesn't really know
[6:31pm] notmyname: right. if they are for protecting the system (eg make sure that resources aren't exhausted), then I don't think per-client info is useful. because you want it on a cluster basis. eg it doesn't matter if a user does more than X. it matters if a particular server is doing more than X
[6:32pm] cdent: Your position is much like mine: You want people to do things in a way that is correct.
[6:32pm] notmyname: all that being said, parsing logging can still give you very near-real-time info. I can imagine a system that is reading a syslog stream rather than parsing a file on disk. that would give you much faster access to the log info
[6:32pm] cdent: Unfortunately it seems there are plenty of people who want to do things in a way that fits with existing precedents or tooling.
[6:34pm] notmyname: cdent: while I do want to be open-minded about other use cases, I'm also coming from the perspective of having written the billing integration for rackspace cloud files and been very involved with the design of the billing/chargeback system for all of swiftstack's customers
[6:35pm] cdent: Sure. I'm agnostic about all this stuff. Just trying to get us to some kind of solution that will make the majority happy.
[6:36pm] cdent: Do you think it would be possible to create an optional thing, running on the swift side, processing logs, that generated and packaged notifications for things like ceilometer to pick up. A latency of a few minutes would probably not be a big deal.
[6:36pm] cdent: Or if that seems unwise, what do you think about moving this conversation out to the mailing list where we have more active participants (including folk who know the use cases better than I do).
[6:37pm] cdent: The third option is for ceilo to go ahead and do the simple thing: extract the middlware to its own package and jack that requirement into swift's requirements in devstack
[6:38pm] notmyname: that 3rd option is basically the state where in now, right? but maybe with some requirements thing fixed (I wasn't clear on the problem there)
[6:39pm] cdent: yeah, it's with some requirements fixed: that would be a swift obligation, or at least a hack of some kind in devstack
[6:39pm] cdent: the goal is to get the middleware to upgrade/install when swift is upgraded installed, not ceilometer
[6:39pm] cdent: as it is swift's pipeline that is engaged
[6:39pm] notmyname: what was the specific problem? what dependency requirement was being violated?
[6:41pm] notmyname: I'd love to work with ceilometer to pull existing info that swift produces. specifically, that's statsd metrics and logs. I don't expect swift to (re-)add log parsing functionality to its repo, at least without a lot deeper understanding of the problem and the agreed solution
[6:41pm] notmyname: (my answer to the general question above)
[6:42pm] cdent: there is the pragmatic problem and the cleanliness problem
[6:42pm] cdent: the pragmatic problem is that in grenade swift was being upgraded before ceilometer
[6:43pm] cdent: this meant that when swift was restarted, it was restarting with the _old_ version of ceilometer's copy of the middleware (and all the things it imports)
[6:43pm] cdent: for a year or more this just worked and nobody noticed. recently there was a change to how versions of things are handled and it blew up.
[6:43pm] notmyname: and ceilometers middleware had backwards-incompatible changes to requirements?
[6:44pm] notmyname: s/requirements/dependencies/
[6:44pm] notmyname: so since grenade installs the newer set of global requirements, then it restarts swift and the middleware gets loaded and the dependencies don't work?
[6:44pm] cdent: by a very round about path (the middleware uses the pipeline (which it shouldn't but see above about the ongoing debate on how best to fix it) the pipeline has some weird dependencies) yes
[6:45pm] cdent: I'm not sure, but the quick fix was just to upgrade ceilometer first in grenade
[6:45pm] cdent: However, the cleanliness problem remains: project ceilometer horks around in project swift's wsgi pipeline. This is considered bad hygiene.
[6:46pm] notmyname: why?
[6:46pm] notmyname: why is "ceilometer has middleware for another project's pipeline" considered bad? its not the first project to do that. and middleware is a good way for extensibility
[6:46pm] cdent: sdague?
[6:46pm] cdent: I would assume because it leads to problems like the one being worked around (requiring ordering where you wouldn't expect)
[6:46pm] clarkb: aiui because its bringing in all of ceilometer rather than a clean ceilometer utility
[6:47pm] clarkb: its fine for ceilometer to be in there, but you don't want all of ceilometer to do it
[6:47pm] cdent: thus the idea to move the middleware to its own package
[6:47pm] clarkb: because logically those services should be able to run independently of each other and only the middleware would be coupled to swift
[6:47pm] cdent: and put it in global requirements
[6:47pm] notmyname: that makes sense. which also tends to get solved by having a new ceilometermiddleware package. like keystone does
[6:47pm] cdent: yes
[6:48pm] cdent: but if we're going to create a new tool
[6:48pm] cdent: then we may as well make the right tool
[6:48pm] cdent: not one that swift doesn't like
[6:48pm] notmyname: :-)
[6:48pm] clarkb: +1
[6:48pm] cdent: because nobody wants to go around slowing stuff down etc
[6:49pm] cdent: so one option is to rip the pipeline code out of the middleware and just send a notification (actually 2) for every single request
[6:50pm] cdent: there are performance concerns with this, but I'd like to think that if we can't load the message bus with notifications are the rate of _many_ per seconds, then we've got serious problems on our hands and need a new message bus...
[6:50pm] cdent: s/are the rate/at the rate/
[6:51pm] notmyname: what are the 2 notifications?
[6:51pm] cdent: 'storage.api.request' is an indicator that a request was made
[6:52pm] cdent: 'storage.objects.incoming.bytes' and 'storage.objects.outgoing.bytes' are the other two, only one of them tends to happen
[6:52pm] notmyname: hmm...those names don't have any client/user/tenant info in them
[6:52pm] cdent: code (old, tired, bad): https://github.com/openstack/ceilometer/blob/master/ceilometer/objectstore/swift_middleware.py
[6:52pm] cdent: that's right notmyname, that's because they are meters: they have a payload
[6:52pm] notmyname: ya I see it now
[6:54pm] cdent: so an option is to not use the pipeline_manager, just call to the messaging bus directly to put a datastructure on the bus
[6:54pm] cdent: the central-agent in ceilometer can hear that stuff and do its thing
[6:56pm] cdent: there's an old bug about this that has some relevant links: https://bugs.launchpad.net/ceilometer/+bug/1285388
[6:56pm] cdent: it has one of the common comments: "rewrite using oslo middlware"
[6:56pm] notmyname: the main issue I have with that is..well 2 things. one is the fact it's in the data path (read write). the other is that the cluster throughput will be limited by the capacity of that data bus (because of point 1)
[6:57pm] notmyname: or did I miss something?
[6:57pm] cdent: I was under the impression that there is now some fancy pants way to drop on the bus in an asynch way.
[6:57pm] cdent: The old rpc way required acknowledgment of receipt
[6:57pm] cdent: the new way is fire and forget
[6:59pm] cdent: notmyname: given that anything will be better than the current situation are you cool with an incremental set of steps in the right direction?
[7:00pm] cdent: the first being extract to its own package?
[7:00pm] notmyname: ya, of course
[7:01pm] cdent: assuming we remain in the data path (for now) can you (or sdague or clarkb) think of how best to make that as efficient as possible
[7:01pm] notmyname: mostly I'm not too interested in it while ceilometer is inventing new ways of collecting info that is already published by swift. especially when those new ways are directly impacting the performance and scalability of swift clusters
[7:02pm] cdent: I thought we established that this particular data wasn't already available?
[7:02pm] cdent: If it is then, please point me at how to get it.
[7:02pm] notmyname: but I have no problems with some different ceilometer package. and I'd be happy to work with the ceilometer team to get the info they need from existing statsd and logs
[7:05pm] cdent: notmyname: you cool with me slicing and dicing this chat into a paste somewhere?
[7:06pm] notmyname: cdent: ya
[7:08pm] cdent: notmyname My takeaways are basically this (please let me know if/where you disagree):
[7:08pm] cdent: you're happy to help
[7:08pm] cdent: you'd prefer a solution that does not impact the performance of swift (and there is not in the read write path)
[7:09pm] cdent: short term you're okay with the current functionality moving to its own package
[7:09pm] cdent: ideally that functionality would be resolved by collecting the data from existing statsd and log handling processes
[7:10pm] notmyname: cdent: yes, that sounds like a good summary with all my cranky bits edited out. thanks :-)
[7:10pm] • cdent is a huge fan of crankiness
[7:11pm] cdent: One of the bizarre side effects of this open source project being corporate open source is that there's far less open crankiness than I would expect in a project of this type.
[7:12pm] notmyname: heh
[7:12pm] mriedem: hey, i take offense to that
[7:12pm] cdent: mriedem++
[7:12pm] mriedem: and get off my damn lawn
[7:12pm] cdent: I thought this was _my_ lawn
[7:12pm] • cdent gets shotgun
[7:12pm] notmyname: cdent: the trick is balancing crankiness and rudeness. I don't mind being cranky if needed. I don't want to be rude :-)
[7:13pm] cdent: It wouldn't be quite so weird if it wasn't clear there was quite a lot of underlying, closed crankiness.
[7:13pm] cdent: No rudeness perceived here. A fine chat.
[7:13pm] notmyname: lol