VIKINGIVESTERLED: From the experiences of an Airline Executive: 2011

Tuesday 13 December 2011

Mass marketing via email

An effective way of marketing to an established set of customers has for a time been mass marketing via email. You should always encourage your customers to include the email address when they register. If you include clauses (with opt out buttons) that let you utilize 3’rd parties it will increase the value of your list. Be careful though, you don’t want it to be a list of people that has banned you from sending to them.

Genuine offers are required. And they will feel extra good about it if they receive it before everybody else. Like sending the email the night before the general release of the offer.

How to send to this list you have collated is a matter of choice. If you are a regular mailer with knowledge of emailing I would recommend a self run packages due to its lower cost in the long run. It also leaves you in control of the list. A list that can have a value in its own. If you contract out, be careful to have clauses in the contract that expressively say the list of email addresses is yours and yours alone, and is to be handed back if the contract ends.

A good one for self hosting is Listmanager form Lyris http://www.lyris.com A very high capacity and fast mailer with response tracking facilities. Couple of hundred thousand an hour should be no problem. It also handles spam filters very well. An important consideration for repeat mailing. You don't want all your efforts to be filtered away. A problem with free and cheap mailers.
Lyris also has good support.

Always send from a domain that can be looked up via reverse dns. This is the first test from most filters. Also set up a real reply address where you can handle automated verification responses from spam filters, and answer spam complaints from the likes of gmail. Be careful with having a opt out link at the end of all sent emails.

You need to carefully monitor your sending’s. A good system should track the progress and put non successful conversations on hold. However a general problem in the transfers can lead to a list where all is on-hold. See to that they are not wiped off your list to quickly. I would also recommend subscribe confirmations via email to verify genuine email addresses and sort out the mischievous.

Integrate your emails with your website so you can track the responses and uptake/sales from each mail. A good package should have this included.

Creating a regular stream of similar emails can be a chore. Some packages also offers automated help with the composing of the emails, allowing a daily marketing becoming somebody’s part-time job.

Saturday 10 December 2011

The importance of selling your company name and logo

If you retail in the market for interchangeable consumer goods, it’s important to build up the customer knowledge of your brand. You should take every opportunity to display your logo and just as important, your brand name. It is easier to get shelf space in a chain of stores if their customers already request your product. And the only way they can know it is by you marketing it to them. This can take many forms. Traditional advertising in papers, on tv or radio. Or more modern forms like website, Wikipedia, or social media like Twitter, Facebook or Google+ A combination is often required to reach momentum.

To get repeat custom they also need to know that it was your product they purchased. That means displaying your brand name and logo very clearly on the product, and not be happy with a small mention in the nearly visible text on the pricing label..

It is surprising how many, that have worked a while in the same company, take it for granted that everybody knows about it. The truth is that many times I where in contact with people from the UK who was asking Ryanair who, when I contacted them about something else than travel. People might not associate the company if the query is unrelated to its most visible activity. Also a company well known in a local area might be nearly unknown outside that area. This can be an issue if your marketing people are local and you are trying to market nationally, or even globally.

If you have a website, it’s important to visualise your product on it. If they click the button products there should be pictures of what you can actually purchase, with proudly displayed logos and company name. Try to find a way of selling your product on the web also. There usually is some way of making a variant of the product that can have a longer lifespan so it can be shipped by cheapest way. It will introduce them nicely to your product range and there will be no delay between seeing and purchasing. Get good deals on shipping to as many destinations as possible and display them on your website. You never know where the next purchaser might be from. Selling online can be done very cheaply using established payment channels like paypal. and can take little effort to accomplish. What is most important is that it gives you a way of better controlling the presentation of the product. On your site there is no big supermarket chain that sets a markup pricing you above your competitor, or give you inferior shelf space.

If you insist on not selling directly, lead potential customers from your product page to outlets that will do your product. And make sure regularly that they actually still do. A customer will not like to travel to a store to get your product and then discover that they don’t stock it after all.

Wednesday 7 December 2011

IT a cost or a source of income

Is IT a cost to be minimized or a, sometimes in a roundabout way, source of income to be maximised. For a long time companies has seen IT as a cost centre, a necessity. IT started out as a way of giving your business an advantage. As your competitors got the same or similar systems iITbecame more of a necessity than advantage. This has lead to the thinking of coomoditating IT. Seeing IT as a service that can be bought from somebody else sample in the recent growth of cloud services. A way of hardware makers to sell you hardware without actually giving you the stuff.

The people in IT then will want to redefine themselves. This together with the need for IT directors ad CIO’s trying to gain entrance to the very top management means that they will try to make more of a mark. Why not by, instead of being a cost, becoming a source of revenue.

Early attempts lead to the invention of internal cost centres. The problem her is the word cost. Now other departments see IT as even more of a cost. Something that can be bought internally or sourced externally. It brought the cost of IT to people that before wasn’t used to paying directly for anything. And with that a backlash.

If we think about it, IT is already a source of revenue generation for many businesses. The growth of the web has seen to that. IT runs websites that in some companies all sales goes through. However this is seldom attributed to IT. Even though they often sourced the system, organized all the technical necessary, and created the site itself. They are just the facilitator, and out of the goodness of their heart, or as a lack of being present at the top where these things are discussed, IT has let the company continue to believe that.

There is many other examples on where IT is revenue generating. Take the customer helpdesk or call centre. Sometimes originally an extension of the IT helpdesk it facilitates the continued sale of your products. And at some companies at a cost of contacting.

What IT becomes in a company often depends on the person in the IT director role (+his/her team) some times, but not necessary, combined with the interest the CEO has of the area. If your IT management is inventive and has a good business acumen they see the potential in the new trends of the market in areas like social advertising, network building and customer interaction. With the current development rate IT has not yet reached the stage where it cannot bring business advantage for the inventive that can redefine the market by their differentness. To state anything else is to say the world has come to its pinnacle of development and there is nothing more to be done.

For the rest maybe commoditation is the way forward. There is certainly enough internal naysayers, and external forces, that sees profits to be made, for that to work too. But then you have given away a potential avenue of making your offering different from everybody else. If you sit down and wait, eventually a competitor will seize the advantage.

Sunday 4 December 2011

Social media a faux pas danger or advertising possibility

In the last years social media has become a way of “normal people to express themselves on the internet. Before you had to use forums, where mostly other people set the order of the day. Today you have a number of possibilities like blogs, twitter, facebook and now also google+

Modern websites needs to keep up with these new media by adding interaction to them like +1 and like buttons to allow you communicate directly with the people, = potential customers, that interact on these site.

Many new and smaller businesses do take it up. It is probably easier to organize if you don’t have to go through several layers of corporate bureaucrazy for a decision to use the media to be made. It also takes somebody with an interest, yes nearly a passion for it, to keep up with the constant changes needed. The modern web is about constantly changing content and interaction. No longer is it enough with a single page with your product and contact details. The more you interact the better your site will be responded to and the more potential customers will know about you.

More established companies are more afraid of what it can do to their current reputation than what potential it brings with it. Some companies insist that no should be done by anyone that isn’t authorised, meaning the company spokesperson or the ceo him/herself. In the day of the social media that person just becomes 1 though. So even with a powerful voice and a lot of resources behind them they will have problem reaching as wide as the whole companies thousands of employees could.

Instead of trying to stop this potential, one should look into better ways of spreading the word in a controlled fashion. In a way it has been tried in the past where sample, companies would invite their employees to vote for them in award competitions. And I don’t just mean “employer of the year”.

The company that don’t follow the customer risk being left behind. For some years the way of advertising n the internet was with email campaigns or advertisements placed on webpages like or internet search engines like Google. Now even large companies consider stopping using emails, switching to other forms of communication. There is speak of a move from search engines to social sites like Facebook, which could be one reason Google has shown interest with it’s +

In the internet age we move fast and newcomers can quickly become the established for so to be overtaken again. They who don’t follow can easily be left behind. Ignore at your peril.

When did the full dump ever help

Most advanced systems will automatically do a dump of their memory, or challenge you to do a dump if they recognise a failure has happened. Some os software vendors also love the dump. Problem is that if the system was able to recognize the cause of the crash, it wouldn’t have crashed in the first place, and a dump is usually just a snapshot of what is in memory at that exact time and tells little about the action prior to the problem.

If a system was able to recognize a crash situation it means the vendor when building it knew this could happened and included a way of logging it. If they had known it could happen they would have put in measures to prevent it in the first place. Most crashes is due to unforeseen circumstances and can therefore not be logged.

A downside with dumps is that they usually become very large and take a long time to extract. If you successfully extract them the tools to analyze them are either longwinded or difficult to interpret the results. This means you usually have to upload them to a os or hw suppliers site. And internet connections have increased a lot in size but the amount of data in these dumps mean you will be clogging a link for a long time.

After all that 99% of the time you will get back, nothing found. I will advocate that it is a lot better to do targeted log extracts. As an admin you will usually have an idea on where the problem lay. Work with the developers or suppliers of the application your run for finding the best tool for logging what is going on. Then play with the parameters of the logging tool at the same time as you put load on your system. This may take some provocation, like artificially increasing the load or reducing the capacity of the system. Easy if you have a multi computer system – turn off some of the resources. But even on a single system you can sample limit the number of processors used or run up an additional load (can be from an additional dummy program).

There can be many different causes to system crashes / malfunctions. I have experienced amongst other missing non-public patches, bent processor pin, bad programming and the reaching of system limits. These last can and have been in os, db, app and hw. What I haven’t experienced is that any of them has been diagnosed correctly and the solution found from a full system/memory dump.

Friday 2 December 2011

Danger of overcomplicating

Today there is many additionals to os and databases that will keep systems running or automatically fail them over if a problem is detected. Well and good as long as everything run according to plan, which of course it never does

What of the undetectable problem. There is no reason a well supplied and admin’d system should fail for a known issue. (Unless that issue has been kept secret from the suppliers side). If the issue is known it should have been patched. But there are many possibilities for issues . very few systems are exactly the same due to the manual ways of installing a system and the many permutations possible when it comes to server, storage, networking, os, database, application, adorns and patching of them all. This usually means that a change can at any time lead to an unexpected event. The only way of taking fully height for this is to have another unchanged system, just in case.

Do not fall in the trap of creating more problems than you guard against, resulting in more downtime rather than less. What these automatic additionals do is add complication. More layers of things that can go wrong. There is a lot to be said for the old manual failovers or restarts as long as there is a 24x7 human interface in place. Yes they had a time delay in data replication, but this could be controlled by you the admin down to a, for the company, acceptable level. Most can live with that if it means higher security of the system = less dependency on the “no system available” manual routine. And higher security regarding the maximum downtime.

Often the fastest resolution is a quick reboot or to fail over to a completely independent system running a bit behind the main system this can be caught in a non failed state. If you make systems that can automatically failover you often have sample the databases running exactly in sync. This can lead to that both db’s have the same error. You can also have problems with the failover process and a worst case scenario is that both systems ends up in a hung state.

Not that a manual secondary system is any guarantee. It requires strict discipline by the admin to see to that it is fully updated to a runable state. Regular testing will be required, and I would recommend regularly do planned switching between the live and the standby. This to ensure that both are in a production capable state when you need them.

Automation and full synchronisation can give problems at time of upgrade or patching. How do you patch in such a way that at least 1 full solution is available in a pre patched state. And stays that way until you know that the patch isn’t going to cause any issues. What do you fall back to if you upgrade your live and your standby as one.

Thursday 1 December 2011

Decision making and the art of management

You are a manager, take that decision and live with it. The advantage is that if you select door a over door b, nobody will know what would have happened if door b had been selected as long as you see your decision through to fruition.

In all decision making there is a bit of “no risk no rice”. You make your choices and take your chances. But see to that you get a result. Nobody can for sure know that the other choice would have been better as long as you make your choice do the job. Abandoned projects in the technology sector is a sign of lack of work at the preproject state, unless you are engaged in r&d. There is a reason it’s called the bleeding edge. There is no reason for why a company where IT is in support of the primary business rather than is their primary business should need to be on it.

That said a successful new way of using IT in a business can give you a competitive edge, but the risk need to be managed and the potential cost upfront.

If you want good admins enable them to make some decisions. But always remember that you can delegate the decision but not the responsibility for it. Train them in your way of thought so they know the direction you would have taken, and therefore is most likely to approve off.

There is nothing wrong with, if there is time, a healthy discussion on alternatives. Hearing somebody else’s view could increase your own knowledge and let you see possibilities you might not yet have thought of. If you are sceptical, play devils advocate and magine up the worst possible scenarios to see if they have a solution for that to.

The worst thing I see is management bringing in consultants to make the decision for them. A popular way in government and the bureaucrazy. If management can’t make decisions maybe the problem is just there. It leads to lack of accountability, but accountability a very important part of management.

Customer support an activa or a burden

Is there a point to ignoring customer support if it can be done for negligible additional cost. Many companies sees customer support as a way of retaining previous customer so they come back for more. Other sees it as a legal requirement that must be bared. Some see it as a way of making their business stand out from the crowd. Other sees it as nothing but a cost.

Support can be done for little or no money. A faq on your website is the easiest sample. It takes little time to assemble a list of possible questions and answers about your product. Other is more resource depending. Like having somebody actually answering questions that customers communicate in, but it can be made profitable by making the customer pay extra for the privilege. Sample support contracts, or a callcenter with a premium phone line. Selling insurance can also be a way of taking payment for your support, or outsourcing to a 3’rd party. A modern way in the internet age is to facilitate a live question and answer page where the answers are provided by 3’rd party agents who finance their time by leading you to additional paid for ads or services.

Some forms of support can be seen as profit reducing. In the shadier side of business a sample can be telling potential customers how to avoid the built in pitfalls in the purchasing process so they can reach the best bargains. Here there is a fine line between naturally occurring issues with the purchasing process and deliberately engineering profit making problems. Alternatively not prioritising fixing issues when they are discovered. Luckily for your customer, if your business is very large there is 3’rd parties at hand, let’s call them agents or facilitators, that will help them overcome/bypass the issues, for a fee of course. Thank’s to the help of the search engine many websites can also be found that will help a frustrated customer. If you are a business owner you will have your work cut out finding them and let’s say ensuring that they are corrected.

Is no customer support a bad thing. Not necessarily if you have a unique selling point that brings customers back to you regardless. Sample if you are a monopolist, they who want your product has no choice and have to buy from you if they want that product. This is one of the reasons monopolies are frowned upon legally. They take a lot of time an effort to police to avoid questionable profiteering.

Importance of monitoring what you have outsourced

Your outsourced system is never as important for your supplier as it is for you. Most contracts has check times counted in minutes, and by the time the set amount of alarm has been triggered, to avoid false positives and an operator has been alerted 15 minutes can easily have gone. And 30 minutes or more before anybody takes it in hand. Since you squeezed the price you pay for the service down to the absolute minimum the agreed penalty is seldom in relation to what the outage means financially to your organisation.

Another reason for doing your own monitoring is that it will give you the unmasked truth. Do you trust your supplier to always tell you what’s going on. Is their answers at times vague or slow forth coming.

The easiest way to see traffic is by network monitoring. A simple network graph from a tool like Utilwatch will give you second by second information, and can run on the cheapest oldest pc you have. If it’s running in the background but within your field of vision you will immediately know if something is amiss. Experience lets you interpret the data better. You can also via simple scripts create easy traffic-lights.

Cheap second by second tools do however seldom store the data. They are wysiwyg.On screen current display only. You seldom need to store this much data though. The interpretation is dependent of other factors at the time. Like did you start/stop something. Where your web caches reloading, Was the blip due to a scheduled maintenance. A simple screenshot will capture the moment for later inclusion in a manual log together with comments.

There is also many tools that let you set up triggers and alarms to your own liking. I would pick at least one that isn’t from the supplier of what you try to monitor. If the supplier know how to monitor it / trigger the alarm they would/should have fixed the problem in the first place.

Some like ipmonitor is also cross platform, and store the history of previous alarms if configured correctly. If your urgency is lower in priority, and/or your problem is outside normal hours tools like Cacti will give you a view of last nights/weeks/months proceedings.

If you don’t feel like spending time or effort on monitoring yourself but still see the value of an outside eye on your hosting/network/resource provider there is many third party suppliers that will happily let you try before you buy their monitoring services. But then you are back to the 15-30 minutes instead of seconds response again.

Wednesday 30 November 2011

Encourage cross department suggestions/initiatives

In companies, due to naturally occurring internal competition, there is many barriers to cross department/field suggestions and initiatives. Sometimes an outside view can be advantageous. The view from somebody with some insight but don’t work with it normally. Why go to an outside consultant when there is probably many within your own organisation that has ideas on the team but no way of exploring them.

Setup an internal discussion forum where ideas can flow across natural divisions. In this day of the web there is no need of having a meeting about it where the ones that likes to hear their own voice rather than have something useful to say is most likely to rule the roost. Sometimes its the quiet thinker that has the deepest thoughts.

Filter and let somebody from the department, who’s responsibility the field is, to spend a little bit of time now and then to mull over the suggestions, and argue against them / state why they are impractical if needed. It will help answer the question of why don’t we do thins this way or that way and drive the whole organisation towards the stated goal implying a great sense of understanding and inclusiveness.

After all what is the cost of lending an ear to new ideas, except for a little time, and the gains could be significant.

This could be especially valuable in a customer facing organisation, or one that wants to be customer friendly. The flow of info from the customer to your business do not always come through the planned channels in this area of online social networking. Much can be gathered via sites like twitter, facebook and google+ Many companies “can’t afford” to monitor these sites, or they who do monitor are not in the right circles. You have a whole workforce that use these media in their private time tough. Utilize this resource.

This approach does require that the management of the company sees the employees as a resource and not just as a cost to be minimised. There is many talented people out there that given the right opportunity could shine even if unexpectedly. The first thought when finding an employee not thriving in their current position should be to see if they could be a better fit somewhere else now when the organisation has learned their strengths (and weaknesses).

Switching an art in change

When will Cisco move on from the dark ages of command line and create a graphical interface that can handle all flavours of its hardware. Or is the key to its “popularity” that it requires a specialist to handle it. In such a way that every company of any size has one that de facto becomes the networking specialist and therefore has a say in what is purchased, upholding the status quo.

Other platforms like 3com could be handled by any admin thanks to it realisation that we live in a windows world. But the admin didn’t need to become a “networking specialist”, meaning didn’t need to do the dark art of programming from the command line. And therefore was not seen as the networking guru. It was a sad day and the beginning of the end, when 3com tried to make their interface more like Cisco’s. That is one thing HP should not follow up on after they bought the company.

Switching is into a revolution with the advance of blade servers. Large companies would before merge all their standalone switches into large chassis creating a single unified unit for switching. With the blade more of the single server connections is handled internally, and only the central part is done by a separate switch. Here there is a task for the server vendors to have a separate but integrated choice of 10gb switches available. And I am talking copper here. Fiber is vulnerable to kinks over short distances and dirt on the connections. = Best suited for longer distance communication. Like building to building or campus to campus or longer. For Within the room or within the floor there is nothing that beats the simplicity and the standardisation of the cats. Though 10gb is not quite there yet when it comes to standards. Special cables for each manufacturers equipment is not the way to go if you want your solution to spread wide.

When you do get 10gb in, you have the challenge of utilizing it. And that include monitoring that you do reach the possible speed. Now we are talking server to storage and racks of other media that would before have been depending on fiber for above 1gb connectivity. Second by second monitoring is required and I can recommend Utilwatch. You’ll be lucky if you see even 2gb/sec utilization so there is a lot to be gained for hw manufacturers in ramping up the performance of their equipment. You can help by getting ssd disks, discussed in the article “SSD a step towards instant computing”

Since we mentioned copper versus fiber and iscsi. Who let the fiber boys hijack the convention for iscsi node naming. It would have been much more convenient if this was done to the ip standard rather than the complicated naming concotion of the fiber. If copy and paste is not your friend, due to 2 separate systems with security between them, you are out with the pen&paper to transfer connection data from server to storage and vice versa.

Tuesday 29 November 2011

HW support in a time critical environment

Not long ago hw support on your critical servers meant that when you called the engineer out he arrived with a boot full of parts. This meant that when whatever part you thought faulty was changed. And if it didn’t fix the problem he would try a number of other possibilities. This equalled the engineer was fast on site and then able to do the diagnostics and rectification in a single fast swoop. How things have changed, and not for the better.

The callout takes a lot longer to accomplish now. First you might have to talk to the, outsourced to a third world callcentre. If your company is English speaking and the support centre’s native language is not you’ll get by if at least one of the parties do have that as a primary language. Problems start building when none of the 2 parties has the common language as their primary.

Next you will have to do a lot of diagnostics to pinpoint exactly the faulty part, because that is the only thing that will be sent to you. And yes I did say sent because these days the part comes directly from an outsourced supplier and not with the engineer. Meaning the engineer will want to ensure that the part is onsite before he/she. Just so he/she won’t waste any time, as if that was better than to waste yours. Expect to waste at 1-2 hours from part arrives to engineer arrives.

If that part was not the only failed item, the process starts over, but this time hopefully helped by the engineer now onsite. Unless he/she decides that the next part is unlikely to come inside his/her duty hours and sneaks out the back door.

And remember in all this, the contracted max onsite response time often only starts ticking from when the problem has been diagnosed by phone and the part/engineer is being dispatched. This often result in that a 4 hour onsite promise is a multi hr diagnostics per telephone and for diagnostics instructions and files to fly bback and forth, and then up to 4hours for the part/engineer to come to site.

There is also a tendency for hw suppliers to see all means of transportation as having to function for their distribution. So for rare or just very new types of systems this might mean the missing part has to be flown to the destination. Don’t expect that to happen if another cloud of ash darkens the sky.. Or what if your hw is broken due to activity that has stopped air traffic, like 9/11.

Is it not incredible that many hw suppliers has a problem identifying your specific setup every tiem you call them. Even if that server is the only one you have from that specific manufacturer, you can be sure that every time you call them you have to give them serial numbers and partnumbers, instead of they just looking up your company name and say “yes we can see it here on our system”.

Vendors need on their internal systems to come up with a way of giving systems the customers name for it. This need to be part of after sales, a much neglected area. For many hw vendors there is no such thing as “after sales”. This is completely handled by support, and they are reactive, meaning they only kick in when a problem occurs and the customer contacts them. Somewhere in between there needs to be something extra. And outsourcing it to an agent do not work. They only get paid for sales, and won’t be directly affected if support has issues.

Monday 28 November 2011

Cooling in a damp climate

On the other hand you have the problem of cooling such a concentrated hotspot. And air conditioning is not of the most stable devices. Your indoor environment is sensitive to the smallest bit of sunlight and the outdoor units are very vulnerable all together. It ends up spending a lot of time de-icing so see to that your runoff is adequate. Can be a problem when your fire extinguishing system needs a completely sealed room. And your insurance, for it to be pressure tested.

A cool but humid climate is not always the best for a datacenter. Yes you need to run your airconditioning slightly less but you get a lot more de-icing issues. One of the reasons reverse cycle airconditioning for home heating never gained popularity in Ireland. Compared to colder but much drier climates like Scandinavia. If you have a weather station with a separate outdoor unit you will know what I mean. They spend a lot of the time showing a humidity error because of very high values.

Underfloor cooling was meant for network racks where the passage through the rack is unobstructed due to the shortness of the equipment. Full length servers make blockages for the flow of the air through the rack so it’s better to give it cold air at the front and remove the hot air from the back of the rack. This way you create cold hallways in front of racks and hot hallways behind racks. If you have several rows of racks this do require that every second is turned the opposite way, avoiding that one servers hot exhaust becomes another’s cooling air intake.

A downside of hot and cold aisles is that where you are most likely to work, at the front where the console is, is also the place where there is a constant cold draft. You could alternatively place the consoles at the back. It eases the cabling. These days it’s more normal to remote control the whole room so there is little need for direct human access. And you could also increase the general temperature of the room slightly. Rather than set it at 19c you could experiment with 22c.

Few will run their cooling via the ups due to the large power demands and the resulting shortening of ups running period at time of grid failure. If your computer room has generator backup, you will need to restart your cooling with the generator. Lack of cooling will make your equipments internal fans increase in speed as room temperature goes up, eventually overloading fuses and cables.

You can temporarily rectify the situation by pumping cold air in from the outside or redistributing the air already in the room better by a dedicated fan and an extendable tunnel, easily and cheaply bought from a hardware store like MachineMart.

Due to the vulnerable nature of airconditioning you will need to overdimension. You should have at least enough that 1/3 of the cooling capacity can be offline for maintenance and you are still able to keep the temeperature within range.

It can sometimes be difficult to spot a failing airconditioning. Simple filter or other error messages on the control panel is mostly self explaining, but sometimes you have a rise in temperature without any message. Check the exhaust for that it’s actually cold. Sometimes they keep on running but just blow out thes same air at the same temperature as it went in. Specially if the outdoor part of the unit has failed.

I will again point out the importance of an environment monitor. They are relatively cheap for what they protect and the same one that monitors your power can also monitor the room temperature. Place sensors in several different positions since it’s highly unlikely to be a uniform temperature in the whole room. And single failures can result in hotspots.

Sunday 27 November 2011

Explosion in power needs

In the later years there has been an explosion in the power requirement per rack. Not long ago you got 2*16amp sockets, for a and b side, and that was it. And it was like that for 10 years. Then came the higher density of blades where 16 servers could now fit in a space before populated by 10 or sometimes jut 5. On top of that each server would have more cores and each chassie would have to have psu’s to cater for it’s top spec Pretty fast you are requiring more like 4*32amp per 10u and fuses where tripping all over the place.

Yes you can power manage by limiting the power each server and chassie can use, but then you can never run at your top capacity, so why did you buy it. You will also have startup issues if you have total power failures.

For security against the frequent failures or just scheduled maintenance of the normal power grid most companies with in-house servers has some form of a ups system. Here the problem is they seldom last for more than 10 or 20 minutes if you are lucky. They will be based on batteries and batteries are not a good way of storing any significant amount of power when it comes to appliances that use large amounts at 220v.

And what can you do in let’s say 15 minutes. It’s hardly enough time for an admin to shut down the most essential databases. (Oracle do not enjoy a sudden and complete loss of power). Most will use best part of that time to trigger the alert. Here an environment monitor like Avtech is worth its weight in gold for fast sms notification.

Most companies above a certain size will backup their ups with a generator. And I do say “a” because very few beside dedicated data centres that offer services to third parties, has more than 1. What they forget is a generator is more like a car. How sure are you that your car will start first time after standing idle for a few weeks. Regular testing is required but most generators stand around for many years, so now we are talking about a 20 year old car. Yes it doesn't have much mileage, but that is not always a good thing. Diesels like to be run.

If you try to solve this by a second generator you are in for a very complicated and vulnerable fail over system, to ensure that every part is redundant. And somewhere in the middle there will be a some sort of a vulnerable failover switch. Remember also you don’t want to make it so complicated that it induces more risk than what you where guarding against.

You could try to get a second grid supply but in most places you will find that an actual physical separation on the supply side is nearly impossible. Competition just hasn’t got that far. You will also run into the same problem as for a second generator, how to feed power from 2 sources.

Saturday 26 November 2011

SSD a step towards instant computing

Ever since I first started working on optimizing server performance I have felt that the ultimate goal is instant computing. Where I define instant as no for the user conceivable delay from the user from request to result. Unfortunately few suppliers has set such, for the outside observer, quite natural goals. They are usually just happy with a bit faster than last year or a bit faster than the competitor. So you will run into a load of configuration limits for system parameters that hasn’t kept up with the explosion in hw possibilities combined with the lowering of price/performance.

As soon as you overcome one bottleneck it’s on to the next one. Part of this quest has been to get as much of the data into memory as possible to overcome the slowness of traditional spinning plate disks. With the arrival off ssd’s I thought we could be close to this goal. And for sample email searches in Outlook it’s close. If you had a few thousand emails searches takes an age because it goes on in the cache your Windows pc stores locally. If you use an ssd disk in your laptop/pc it’s down from minutes and sometimes hours to seconds. The greatest leap ever, but so little appreciated that even Dell stopped (for a while) putting ssd’s as an option even on their high end pc’s.

The greatest gain for servers is obviously where there is a high frequency of ever changing data. Like database logs. Unfortunately also the one area where the recommendations are not to use them due to the ssd’s limitation of total rewrites. There is work going on to automatically exclude areas that nears this limit. Though not fast enough for some that reached it with total failure of whole disk shelves as a result. This write limit should also be a thought for san manufacturers that automate on what type of disk the different types of data are stored depending on their frequency of access. Maybe one should just take the penalty and routinely change out the disks every about 18 months. An easy task with proper raiding. And if you went with the cheaper server type or medium sized storage ssd’s instead of the super san = super expensive ones, still a cost effective way.

Aside from that log versus max total writes anomaly databases has much to be gained from ssd’s. Specially they so large that they can’t be all sucked into ram or where there is a high frequency of updates and where one for security precautions prefer the synchronous write instead of asynchronous.

Server internal ssd’s are actually an alternative for servers that before was optimised by utilizing the caching ram of an external storage unit. This way saving considerably on your next system hw upgrade.

Friday 25 November 2011

Backups, art that needs reinvention

Some of the articles you see about data lost in the cloud is beyond belief. There is no excuse for loosing data that was stored more than 24 hours before the problem happened. Most storage users will have a few snapshots and a dr tested way of restoring them. The problem comes when you go beyond the snapshot that is still on disk. Backup of snapshots to other medium is still in its infancy. The most prominent of backup solutions jsut don’t have it in them And I have seen virtual server systems presented as complete solutions without a thought for how to get the data back if the thing burned down or, currently more likely, was drowned in a flood. There is a job here for a specialist in deduping, with the added flavour of a couple of extra copies.

There is a tendency to not treat virtual servers as real servers. Of course you can restore all the physical servers. But what about the virtual ones. With dormant or little used virtual servers a lot of them can fit on a few physical hosts. But the total data can still be the same as if each server was a separate physical. If you haven’t backed it all up, you need to at a minimum have a definite restorable master and a record of all the steps taken to create each one.

We should not either forget the data people bring around with them. As laptops get ever more capable, most now more powerful than servers where 4 years ago. Developers like to have it all at hand. A very important part of that time critical project might has its only copy on a thing thrown hither and dither every morning and evening. Greatly encouraged by the cheap developer tools licensing we see emerge as a teaser to get more people onboard. And developers never where the first to think about what happens when things go wrong, or whether that online storage deal included a quantifiable and guaranteed backup/restore.

Often the issue is it takes a long time for a user to discover that their data is actually no longer there. Today even the smallest of user can have thousands of files. And since nobody longer learns about file system and folders they never see them except when they need them. It can take months or years if they are only used at the annual budget time or multi yearly planning stage. For that amount of data/iterations it is/was often uneconomical to store it all on disks. Besides your auditor probably still loves the tape.

We also have the fast pace of the technology. A much used refresh cycle is 3 to 4 years due to the rapid rise in hardware support costs after the initial contracted support period. But the requirement is that financial data is to be stored for 7 years. Ask your IT department if they can restore you a 7 year old backup. Even if they have the tapes do they have the drives to restore them with or the system to restore them on to. Not such a large problem if the software system is still in use and the data stored in a database. They are easy to migrate with the hardware refresh as long as you haven’t segregated out to much of the old to fit the new. Still you can always add some more modern storage to get those data back in, if you planned for that eventuality in the first place.

Relational databases – a quick look at flavours strengths and weaknesses

Let’s start with the master of them all Oracle. It’s the db with all the tools, tweaks and it scales well. And if the price was right this is the one most would or should pick, however it seldom is. Oracle never followed the development in the processor where each core get weaker but you get a lot more of them. Hence their penchant for charging per core and their customers liking of the HP Itanium processor.

Oracle is so advanced that it’s more like an operating system in itself and you need to take your patching seriously. Also be into your file system details. Play with the config files, there is a lot to be gained. It’s a pity the 3 defaults of small medium and large is not more up to modern standards. Proper bakcups are essential.

Oracle do not like loss of any of it’s data. And since a lot of performance can be gained from running it raw, a simple file system backup won’t do the job. You need to learn about dumps like dd, and have it done in the correct order. Exports is also very important. In addition to being a secondary way of doing backups, they can also give you a lot of hints on fragmentation and proper sizing. Don’t either forget to have multiple control files in many separate locations.

It’s the one db where you really can’t live without a support contract from the mothership. And if you have a set of the printed manuals, they will be from a previous version but they are worth their weight in gold. And 95% of them is still applicable. Read all about it’s system tables. There is a lot to be gained here. For standardisation and easy admin to admin transfer have a look at the old OFA manual.

It’s nearest competitor as a multi os db is Sybase. Now owned by SAP. A brilliantly designed but more simplistic model. However you’ll have problems getting more than 1 installation (version) onto a single server. Instead it uses what they call userdatabases. Requires a strict discipline as an admin so you know which one you are in. But organizing the file storage and backups are a lot simpler

It’s penchant for “go” is not as good as Oracle’s execute command, and it’s method of dumping output to file is archaic. Like Oracle it’s very sensitive to playing with the kernel settings on unix/linux. Most of it’s performance is to be gained here. In addition to, like most relationals, a good scheduled reindexing. A good set of Sybase’s own manuals will go a long way for your support needs.

Mssql could have done the knockout on the other db’s if it hadn’t such a scaling problem. It depends on a single server, and Windows on top of that, and can only scale upwards at the speed of the hardware development. Windows Datacenter is an option but due to its obscurity and odd Microsoft rules on deployment, Windows Enterprise is really your option. And then we are back to this processor thing again. It is a database that most admins can manage though, even without the scarce manuals. They might not utilize all it’s potential but any Windows admin can make it run. Just give them a few hints or a small course on simple housekeeping like dumps and scheduled reindexing/reorg.

Mssql’s testing/analyzing tool is very good but it’s not as handy as Oracle’s command line “desc” for analyzing sigle sql queries. However it does give you a nice way of presenting your findings.

Adabas owned by the German Softwareag is a story of what could have been. Popular among some German companies/developers it never reached the popularity of Sybase. If you have seen it its probably because you had a system from a German company that was based on it. Very simple to manage, don’t even need a manual to start, stop and backup this one. Low cost and flexible. It’s ripe for a large multinational to take it over. Somebody with a long reach, believe and financial muscle to push it into the limelight.

Mysql the developers favourite due to their perception that it’s “free”. Now owned by Oracle. There is no such thing as a “free lunch” however. What you don’t pay for the software itself, you, due to it’s popularity among specialists, will pay for in admins. Recommend testing your restores frequently. Specially when it comes to getting back the last data entered. Ripe for a organized set of admin tools. Oracle has a long way to go, and lots of opportunity for ad on profit making.

A problem with all relationals is that they are good for adding and picking/filtering small amounts of data and creating automation for repeated actions, but when you reach certain level of reads needed it’s better to forget about the indexing. When that happens the old ways are better. The db’s security against data loss also makes them vulnerable for slow down by locks and erroring by deadlocks. This is why if you have to read all the inputs/data it’s faster to use the file system directly for your (interim) storage, without the overhang. Many large players do.

Prioritising when everybody is screaming

We have all been there, just one of those days when things go wrong. And sods law says they all come together. Now you need to prioritise. If you have done your preparations this should be easy in your head. You have your list of systems in prioritised order as a result of their importance to the company and the immediacy of the effect of downtime. Adjusted for top management priority and current interest.

Let the admins get on with the job, see if they need additional outside help, and keep the top brass away. Most fixes are based on the rolling halv hour. It will take halve an hour to know if this will work before we discover the next issue, or try the next possibility. Be ready to run parallel avenues or make sharp decisions. If your systems are properly admin’d / backed up there is usually a fast way or a slow way to restore. Problem is, when do you abandon the fast way and go with the more secure but slow.

Avoid falling in the trap of helping the biggest nuisance user first, or the directors darling. The dangerous complainer is the one that says nothing to your face but complains without you knowing and without change of response. And their argument will stand if their system should have had priority. Many systems are important but they can take a certain amount of downtime. How many accountants do you see outside hours or in weekend outside of the budget and reporting cycles. Use your urgency/dependency listing from the DR plan to help you. Think of when they normally schedule upgrades. You will also thank yourself for not storing all the eggs in one basket or all the data on the same san.

Recruiting, hints for managers out hiring

First find out what you are looking for. If you are hiring for a manger are you sure you want the same as the last one. He/she might have worked out well but their field speciality is probably well covered for now. Maybe the next one should be slightly different.

Time from announcement/advertisement to first interview should be as short as possible. Specially if you recruit for tech jobs where there are many companies looking for the same candidates. How many times has your company lost out on a candidate that is no longer available. How many has turned down an offer of interview. If the number is high you need to look at your process again.

If your plan is for more than 2 rounds of interview your recruiting is not as efficient as it could be and you are likely to lose out on the best candidates. Did you weed out enough of the chaff by reading the cv’s thoroughly or are you wasting everyones time by just skimming them for the first time at the interview. Large multinationals are big sinners in having many rounds of interviews. Is that a sign of too many corporate layers = bureaucrazy. Is there to many people involved in your decision process. However if you the manager ain’t technical, bring an expert from your team. It gives them the chance to meet their potential future colleague.

When doing the interview do you do the “take me through your cv thing”. That means you have to fit it to the job. An alternative approach could be “take me through samples of your experience for each of the requirements in our job description”. This will give the candidate the opportunity to bring in more relevant stuff, and will let you see if they can translate their earlier experience to their new tasks.

Do you use a technical test already at first interview. It will help you confirm your initial opinion, and you can have more experts evaluating it, covering a larger technical field. There is nothing wrong with programming on paper. Many universities still use it for their exams. Personally I don’t see any problem with manuals, helping aids or mobile phones either. If they can get assistance at the test, they can get it in their work, and what you really want is somebody that can complete the job.

There is nothing wrong testing for all the nice to haves also. Remember one thing, if the candidate knew the answer to all the test questions, the test wasn’t hard enough.

When you have done a few interviews you know the do’s and dont’s. Bring personnel in on the second round, reducing the times you have to wait for their availability. And they don’t really need to meet anyone that isn’t to be hired. One of the biggest delays can be organizing a time that suits everybody. Therefore bring as few interviewers as possible, but always minimum 1 other for legal reasons. And It gives you thinking time.. If you are not the final decision maker, think about if you yourself need to see the candidate more than 1 time and leave the final interview to the decision maker and personnel. I would suggest just 2 for second rounds, just so there is a choice. And you don’t have to send forward anyone you can’t live with yourself.

If you are unsure of a candidate, or rather you think there could be a better candidate out there who’s cv you haven’t seen yet. If the potential candidate is free on the market at the moment, take a chance. That’s what probation is for. It’s takes less ruthlessness to trial somebody currently out of work, than somebody that has to quit their current job. And your deal with the recruitment agency should always include a step down ladder in fee if a candidate is later found not suitable.

Lastly, give a thought to all that was unsuccessful. It won’t cost you much to tell them, but it will mean a lot to them to know.