Archive for April, 2013

April 23, 2013

"It is not enough to observe, experiment, theorize, calculate and communicate;…

"It is not enough to observe, experiment, theorize, calculate and communicate; we must also argue, criticize, debate, expound, summarize, and otherwise transform the information that we have obtained individually into reliable, well established, public knowledge."
– John Ziman. 1969. "Information, Communication, Knowledge," Nature 224: 318-324; abstract online at <http://bit.ly/cNPB1d>.

Thanks to +Richard Hake for this.

Embedded Link

Richard Hake – Google+ – Some Google Plusers might be interested in a recent post…
Some Google Plusers might be interested in a recent post “Symbiosis: A Standardista Endorses a Direct Instructionist” [Hake (2013)]. The abstract reads: …

Google+: Reshared 1 times
Google+: View post on Google+

Post imported by Google+Blog. Created By Daniel Treadwell.

Tags:
April 22, 2013

Think of the internet as the Library of Alexandria

but scattered around, with all the contents shuffled. #Discoverability is a major problem.Imagine that we do, in fact and actually, already have something like the Agora, where free and full discussion about the problems and issues of our day is taking place, 24/7. Again: #discoverability … Tweets, FB replies, blog comments … all a huge incoherent cloud. And worse? maybe worst of all? the good stuff is flooded out by a combination of  trivial blather and manipulative marketing rhetoric.Now imagine that there were a way to sort through this.Want just tweets, for entertainment? you've got it. Chit-chat like FB stream? you've got it.Care about schools funding good programs for your kids? controlling that creep next door with the collection of assault weapons? banks that manipulate markets in a way that strips your wallet while bulging their pals'? Well … no can do. Could do … could. If the 20% who cared actually got what they needed to work together on making the world safer, and fairer, and more sane.That's why I created Protension. (It's a double pun. "Tension" around issues, with everyone saying others are wrong. Against what's being said. We can turn that around to be positive and constructive. So pro-tention. And "contention" … some points are contentious, twitchy, hot-button. We can drill into that in a way that is, again, positive and constructive. Well … right now we can't. But we could.)cc: +Tim Bonnemann  +Martijn Russchen  +Tiago Peixoto  +Merijn Terheggen  

Google+: Reshared 1 times
Google+: View post on Google+

Post imported by Google+Blog. Created By Daniel Treadwell.

Tags:
April 13, 2013

Classification and Routing

I just spent some time thinking about 1 to many notification.

I agree with what you wrote here:
> The solution is a better coordination of the many flares to
> a single flare that summarizes the entire domino-effect of failures.

But I found myself wondering if you had mis-stated the situation here:
> _Well, the problem isn't that we don't need flares, the problem is we
> have too many flares._
Literally true? I don't think so.

I think why this matters is that it's experienced as "too many flares".
I'd reformulate this way: everything is reported with the same priority (null) and routing (broadcast), so actually there aren't any flares at all!

FWIW at my DEWLine site I had 3 levels of audio alert. 1st was a loud bell ringing /dong … dong … dong/ … one strike every 2 or 3 seconds. Quite a while later (minute and a half?) a buzzer would sound. Still later the phone would ring. That was Cheyenne Mountain … a shift officer (All of them called themselves "Snoopy"; such a good nick cuz that's what they were doing: snooping huh huh) and that was not a good thing. He was checking to see if I'd dropped the ball.

But see I don't know for a fact that I have the scenario right at the level of operations.
Do different failure modes trigger signals that are qualitively different? 

Addendum: There was no real response to this. There was a reply … which is better than stony silence, of course. But not anything like a response that would in some even small way acknowledge what I had written.

Google+: View post on Google+

Post imported by Google+Blog. Created By Daniel Treadwell.

Tags:
April 13, 2013

Trying to connect with a certain person by email I wrote "Triage and decision…

Trying to connect with a certain person by email I wrote "Triage and decision support"

Situation: our shiny new-fangled aircraft landing system is in place at the foot of a runway in a narrow mountain valley.

    An aircraft is inbound. It will be doing an unusually steep approach because that's what our system allows. It's snowing, but our system will guide the pilot past his usual decision point because, again, that's what our system allows.

    BITE detects a small set of parameters out of range.
Q: how to respond? who to signal? with what? What to do?
A1: flash a light on the local remote unit and send data.
A2: shut the system down, conforming with prime mandate: thou shalt not transmit false data.

Maybe I'm dreaming in technicolor. But this seems to me analogous to cloud monitoring.

Please let me know if I'm dead wrong and I'll just back off.

Google+: View post on Google+

Post imported by Google+Blog. Created By Daniel Treadwell.

Tags:
April 13, 2013

re: Salience

    What's turning me on right now is the idea of an effective real-time alert system. #Notification (I really like the work that @RTWworld is doing.)
    But background is the idea that sets of "failure alerts" (You have nomenclature for that? Cluster telemetry and such?) can represent a pre-identified "failure mode".

Here's what I wrote as comment in "Measure Anything, Measure Everything". (<codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/#comment-16748>)
(This was by mistake. I meant to comment on +SandyWalsh's "The Monitoring Stack (the state of the art)". (<sandywalsh.com/2013/04/the-monitoring-stack-state-of-art.html>)

    Something that maybe few people know about: Failure Modes;
    Effects and Criticality Analysis (FMECA).
         Slightly related to “measure everything”; idea is to analyze
     every chunk (“component”, “unit”, slice it as you will) to get a sense
     of how important it is. Sort of like triage before the failure.

Mean Time Between Failure and Mean Time To Repair is what we used (Hardware; avionics.) Stir in criticality i.e. consequences of error/failure.
This ends up giving you a real good idea of what you need to focus on i.e. worst case would be something that’s likely to fail soon, PITA to replace, and devastating in effect.

Embedded Link

Measure Anything, Measure Everything « Code as Craft
Measure Anything, Measure Everything. Posted by Ian Malpass | Filed under data, engineering, infrastructure. If Engineering at Etsy has a religion, it’s the Church of Graphs. If it moves, we track it….

Google+: View post on Google+

Post imported by Google+Blog. Created By Daniel Treadwell.

Tags:
April 13, 2013

> the problem isn't that we don't need flares, the problem is we have

> the problem isn't that we don't need flares, the problem is we have
> too many flares. The solution is a better coordination of the many flares
> to a single flare that summarizes the entire domino-effect of failures.
Right! So: "Information is data that makes a difference."

I'm not sure but sounds to me (#Nomenclature) that when you say "too many flares" you're talking about a whole bunch of "real-time operational metrics" .
But every datum shouldn't become a flare … an alert. That's what my subject line meant to imply. (Context: "Found was I was responding to" in your blog.)
For me a "flare" signifies a pre-specified condition i.e. "that freakin' HD has failed".

So yes, what you write about here is exactly what I guessed / intuited.

Example (from the avionics R&D project I worked on in Sydney … Micronav International, in Point Edward): any known failure mode will create a recognizable set of readings in our Built In Testing Equipment.
Q: How to take that data and produce information meaningful to the attendant? (BTW: we finished design for BITE but unfortunately the project failed before we got to that next state.)

> Currently, that's an unsolved problem, but people are thinking about it.
Well I'd love to chat with those people! 🙂

I was using #monitoring/#notification … it seems to me that's accurate … that it failed to communicate makes me wonder just where I stepped into cartoon fiction.

> _If you use PagerDuty or Nagios you'll know that it comes with an
> inbox full of noise.
A) nice to see familiar nomenclature. A signal that isn't significant isn't noice … but it sure ain't useful!

B) I haven't "used" anything. I only 3 days ago realized that Rackspace (Who I've known of for years) is very implicated in OpenStack (which is quite new to me). Looking into that got me here.
Ganglia made perfect sense to me.
Your videos were also totally meaningful. (From memory … StackTach and Stacky?) BTW I left a question about stacky on one of your TY.

FWIW I have Nagios docs loaded in a set of tabs; just now reading up on Ceilometer to get a sense of the plumbing.

> How to make that salient is key.
Contact … solid copy … salience = information; Irrelevant "operational metrics" ain't. (Can't call those metrics noise/static since it's not random / entropic.)

But yes, precisely that: the way I used the terms, "information" as a sub-set of all readings.

FWIW I first encountered this when babysitting NORAD/SAC multiplexing. (DEW Line … I'm old.) Channels, groups, super-groups … amazing similarities with cloud / instances / servers.

Sure, over-view of all metrics would give an experienced operator a sense of system status as a whole, but what was paramount was to have "flares" that were diagnostic.

Google+: View post on Google+

Post imported by Google+Blog. Created By Daniel Treadwell.

Tags:
April 13, 2013

This pretty much captures it:

    given some sort of failure, that the system write to the log is not enough.
    That data and metrics present that data as information is likewise not
     enough. What would exhaust the need is some sort of alert or flare,
     a message of some sort.

Maybe there's no UseCase. Maybe folk managing a cloud don't need alerts or flares or messages … which makes no sense to me, but hey I'm out of my element here.

Trying to connect with folk who don't just //talk// about the stuff but are actually doing it.
It's non-sexy … a definite fail when it comes to #AttentionEconomy … 20/80 … less attractive than EdTech.
But here's I'm talking about my project, not NoGuff.

Real-time comms is what turns me on. I guess I was caught up by the fantasy of what could be.
#EIS [that's Executive Information Services]

Google+: View post on Google+

Post imported by Google+Blog. Created By Daniel Treadwell.

Tags:
April 13, 2013

NoGuff

_It's about "information" contra "data".re: #monitoring / #notification   When I think of cloud monitoring (Stacky and Ganglia are what I've spidered in the past day.) it seems to me that there's an aspect of this task/chore/need that's very similar to something core to my project. 1    What I imagine is, given some sort of failure, that the system write to the log is not enough. That data and metrics present that data as information is likewise not enough. What would exhaust the need is some sort of alert or flare, a message of some sort. Notification.    Whatever techniques are used for distributing those messages (What gets sent to whom, by what means and in what form?) … that's very, very close to the core of my work. So: I figure there might be something here that is mutually beneficial.But maybe I've misread the situation entirely!1) "my project" = GNodal; Protension; Exhibitum … a "discourse-based decision support system". When I imagined a system for propagating system outages I came up with the name "NoDuff", an infantry radio term/procedure word we used to signify "This is not a test"..NB: In EMail I can use superscript, as well as bold and italics. I can also create links. There's nothing good about the GUI Google has given us here. It's a test … to see how we feel about eating shit from a spoon.

Google+: View post on Google+

Post imported by Google+Blog. Created By Daniel Treadwell.

Tags: