Michael J. Walter's Blog: 2009

Friday, October 9, 2009

Careful what you post on LinkedIn

Interesting article on 128 bit on Windows 8 & 9. More interesting though is how the information leaked out. It was posted by a developer as part of their LinkedIn profile. This kind of makes you wonder what other proprietary information is buried in the profiles of LinkedIn participants.

microsoft-leaks-details-of-windows-8-and-windows-9

Friday, September 18, 2009

Nagios Remote Monitoring Unix/Linux with check_by_ssh

Nagios (http://www.nagios.org) does full remote monitoring by ssh'ing into the foreign server and running the nagios plugin on that server. So for it to work SSH has to work (without interaction) and the plugins (only) need to be installed on the remote server. This monitoring does not require any special privileges (i.e. It doesn't need to run as root).

On the remote (to be monitored) server (assumes a user name of nagmon):
useradd -d /export/home/nagmon -m nagmon *create our local monitoring service account

passwd nagmon *the password you set is unimportant, we won’t ever use the password to login you just need to set one to enable the account

mkdir /export/home/nagmon/.ssh

chown nagmon /export/home/nagmon/.ssh

On the nagios (monitor) server:
ssh-keygen -t rsa -b 4096 *generate an rsa key pair and save to /etc/ssh/nagmon_rsa, when prompted for a passphrase just hit enter

cat /etc/ssh/nagmon_rsa.pub *this is the public host key file that will be used to authenticate ssh, copy all the text in the file

On the remote (to be monitored) server:
vi /export/home/nagmon/.ssh/authorized_keys *paste the text from the previous command into the file, then save and exit the file

chown nagmon /export/home/nagmon/.ssh/authorized_keys *make sure our service account can read the key file

chmod 600 /export/home/nagmon/.ssh/authorized_keys *ssh will reject the connection if the proper permissions are not set on the file

mkdir /usr/lib/nagios/plugins *install/copy the nagios plugins to this directory, usually by copying from the nagios server. Make sure the plugins are compiled for the system/processor you are using.

chmod 755 /usr/lib/nagios/plugins/*

*Example Nagios Config (monitoring a remote disk):
Add to the file that contains your command configurations:

define command{
name check_remote_disk
command_name check_remote_disk
command_line /usr/local/nagios/libexec/check_by_ssh -H $HOSTADDRESS$ -2 -C '/usr/lib/nagios/plugins/check_disk -w $ARG2$ -c $ARG3$ -p $ARG1$' -l nagmon -i /etc/ssh/nagmon_rsa
}
Add to the file that contains your service configurations

define service{
use local-service *this is a template, yours will probably be named differently
host_name
service_description usr partition
check_command check_remote_disk!/usr!25%!10%
}
That's it! You can run any of the local check nagios plugins via the tunnel and return the results to nagios.

Sunday, September 13, 2009

Updating Vendor Certifications

I've been working on getting my certifications up to date. Mainly because I see certifications as a good benchmark to keep your general knowledge level up to date. It's easy to get deep into a specific function of a technology but certifications make you step back and look at the big picture again.

But I digress. I just re-certified my CCNA (Cisco Certified Network Associate) and it made me think a little. I did the same certification about a decade ago and frankly the test was significantly easier and covered a lot less material back then. I suspect that Cisco did this as a result of introducing it's new entry level CCENT certification. But being a hiring manager my impression of the CCNA was clearly off base prior to retaking it this year. We were making this certification a requirement for our entry level networking folks and I don't think that was appropriate.

So how often do vendors update certifications by making them significantly more difficult and how do vendors get the word out on these changes to hiring managers? Now I know but i clearly hadn't gotten this message prior to completing the test myself.

*Example: I still recall on my first CCNA being annoyed that I had learned STP inside and out. The only question I had on it was something like "what protocol would you use to prevent layer 2 loops". For the recent CCNA exam you had to know VTP, trunking protocols, port trunk modes, STP, RSTP etc... So there was clearly a lot more content.

Monday, May 18, 2009

Integrating IT Services

It's been one of my pet peeves for a while that IT doesn't always seem to eat it's own dog food. Specifically we extol the virtues of definitive data sources and application integration but we very rarely use definitive sources and integrate our own systems.

So I haven't been posting much because all of my free time over the last few months has gone into developing something to do just that. I'm definitely still in the Alpha stage of development but I do have something presentable. I'm looking for any and all feedback on:

http://www.attendtoit.com

Tuesday, February 10, 2009

Security breach = malpractice?

The FAA had a security breach of non-air traffic control systems that resulted in loss of confidential employee info.

http://www.newsweek.com/id/184051

*Official statement from the FAA: http://www.faa.gov/news/press_releases/news_story.cfm?newsId=10394

Some interesting points

-The end of the article implies (without proving) that this was the second breach of this same FAA network and nothing was done the first time.

-Some of the data stolen was encrypted employee medical information. (*I'm wondering why the FAA would need to store medical records, is this common in non Health organizations? Is this data covered by the same standards as health information technology?)

But the reason I wanted to post this was the following sentence:

"Our information technology systems people need to take a long hard look at themselves and their capabilities. This is malpractice in their world." -Tom Waters president of American Federation of State, County and Municipal Employees Local 3290

So the questions here are;

Is there such a thing as IT malpractice? Is a security breach indicative of IT malpractice? Are multiple breaches proof of malpractice? Let's take these on one by one:

-Is there such a thing as IT malpractice?

malpractice - Mistakes or negligent conduct by a professional person, especially a physician, that results in damage to others, such as misdiagnosis of a serious illness. Damaged parties often seek compensation by bringing malpractice suits against the offending physician or other professional.

I think the key part of the definition here is "professional". While IT as an industry has all of the challenge of any other "professional" industry we do not have a central body to certify professionals. By that I mean that while there are various vendor and organizational certifications we do not have a formal licensing body. So i don't believe we meet the legal definition of professionals which means that we are incapable of malpractice.

-Is a security breach indicative of IT malpractice?

delinquency - Failure in or neglect of duty or obligation; dereliction; default:delinquency in payment of dues.

Based on the above, lets substitute the word delinquency for malpractice. Here I think it depends on the specifics of the incident itself. If you maintained good security best practices then you were probably not delinquent. If you did not (Leaving the default/easy/no password on your firewalls and routers. Not maintaining audit trails. Not restricting access rights to the minimum necessary.) then I would call that delinquent. If the IT department was delinquent it should certainly be held accountable, but we do not have enough information here to indicate that.

-Are multiple breaches proof of malpractice(delinquency)?

Proof? Probably not. But it is certainly indicative. Any security breach should be followed by a post-mortem investigation and response. Two security incidents using the same attack vector over a period of time would seem to suggest that the post-mortem was not done or wasn't done well and would be a reflection on the IT team.

malpractice. (n.d.). The American Heritage® New Dictionary of Cultural Literacy, Third Edition. Retrieved February 10, 2009, from Dictionary.com website:http://dictionary.reference.com/browse/malpractice

delinquency. (n.d.). Dictionary.com Unabridged (v 1.1). Retrieved February 10, 2009, from Dictionary.com website: http://dictionary.reference.com/browse/delinquency

Lesson Learned: MS Terminal Services - Volatile Memory

Had an interesting issue with Microsoft terminal services the other day. I'm still wrapping my head around this so there may be some inaccuracies here, i'll do my best to correct them as I find them. Apparently environment variables are stored in "volatile memory" which can cause problems in applications that use common logins and environment variables.

Specifically, we have a single service account that is used by several thin client devices. The account logs in to the terminal server automatically and launches an application. (The application itself requires a login so our security exposure is tolerable.) As the application is launched the %clientname% environment variable is read and sent to the application so that workstation specific workflows can be configured.

Now the interesting part. When 2 or more thin clients log in within 1 second of each other, they can "steal" each others name. This was tied back to the %clientname% environment variable changing in between the initial login and when %clientname% is sent to the application. It seems when the second thin client logs in as the first is launching the application the second is overwriting the environment variables (all within the same user profile because a shared service account is used) resulting in the second thin clients name being used for both. So... Environment variables are user specific not session specific.

Work arounds:

1) Configure different service accounts for each workstation/client.

2) Require end users to log in with their own credentials rather than using a service account.

3) Use non-volatile session specific variables in the WMI instead of environment variables.

Wednesday, February 4, 2009

What is a CTO?

Recently I was asked what good qualities in a CTO are... Following is the somewhat over stated answer I came up with. Interestingly enough when I was offered my CTO job I looked for job descriptions and didn't have much luck. I went into the job knowing I would have to wing it until I figured out how the relationship would work. Maybe this will make it just a tad easier for the next up and coming CTO.

Well it's a tougher question than you may think but i'll give it a go. I think most of this is developed between the CIO and CTO, they figure out their own boundaries and cover for each other with some specific differences.

I like to think of the relationship in terms of a restaurant. The restaurant manager (CIO) is the public face of the department. (S)He makes sure the customer is happy, they've gotten what they expected in a timely manor and that the back is delivering. The head chef (CTO) is responsible for the back of the house. (S)He makes sure all the specific components are in the food, the food is fully cooked and safe and that there is consistency across meals. To step away from the analogy, the CIO is responsible for politics and departmental direction. The CTO is responsible for the technical direction and vision. Both roles cross over though. A CTO should be able to step into an IT political travesty and bring it back under control and a CIO should be able to determine if an architecture diagram is flawed.

The big difference is vision versus direction. The CIO should be watching technology as it is today. The CTO should be watching (or creating) technology as it will become.

All that said, what are the qualities of a good CTO?

-STRONG ability to talk about technology at the executive level in a non-technical way
-Solid political (be it internal, vendor, partner or client) understanding and empathy
-Solid project management
-Understanding of business goals and challenges
-Ability to be flexible as things change
-Established IT leadership both in the trenches and as a manager
-Ability to make quick correct decisions and take control in an emergency/disaster
-Passion for your employees and desire to see their careers grow (Mentor)
-Obsessive compulsive need to stay on top of the latest technology trends and offerings
-Willingness to work well over 40 hours a week
-Last but not least... A solid understanding of technology across IT specializations. You will be the one keeping IT directors in check so you will need to know networking, server administration, development, project management, help desk operations, database management, etc... at a specialists level. Well enough to know when someone is trying to pull the wool over your eyes. If you don't feel that you could step in and cover for any of your directors in an emergency you are probably not qualified to be a CTO (yet).

I'd be interested to see what some of the other CTO's out there have to say about this. Like i said, every CTO/CIO relationship is different so i expect there would be some variance from my comments.

Hope that helps!

Friday, January 30, 2009

Impacts of the Recession on IT

In my opinion....

Companies will get a lot more focused on the bottom line. That will mean staff reductions, process optimization and re-prioritization. I suspect us IT folks will be a bit safer than the rest as those are the particular areas IT is seen as shining in.

Some specialties that aren't seen as having a direct impact on revenue (security, auditing, QA/QC, performance testing come to mind) will be more likely to get outsourced or reduced or eliminated entirely (by giving those job responsibilities to other specialties). Many departments will return to shoot from the hip/cowboy IT where things are moved to production with minimal testing, documentation and oversight.

IT systems/support, internal development, interface engineering and other direct revenue roles will see greater work loads as they are expected to introduce revenue (via optimization or new development) and reduce expenditures.

Open source will see a spike as IT systems folks try to do more with less.

IT management (me ugh...) will see reductions especially where salaries are high and performance isn't seen as in line with business goals (which is common in organizations that don't "get" IT but also a reflection on the individual manager).

Help desks will see reductions as systems folk responsibilities will start to include top level end user support.

Large companies and organizations will be more likely to cut IT as a percentage across specialization's, small and medium will cut where they see the least impact on revenue.

Of course as i said, that's all just my opinion, we'll see how this really plays out. Being an executive i'm already on the IT "fringe" and the safest place is in the core (as long as you are competitive among you peers) at times like this. I do have faith that things will turn around though. We just need to make it to that light at the end of the tunnel and remember a lot of folks have it a lot harder than us IT people.

Thursday, January 29, 2009

SQL Server Object Naming Convention

Over the years as a DBA i've developed a naming convention for my database objects. It certainly needs a little work but I do think it makes code a little easier to read and debug.

Top Level Objects:

udb_ - User DataBase - A database in Microsoft® SQL Server™ 2000 consists of a collection of tables with data, and other objects, such as views, indexes, stored procedures, and triggers, that are defined to support the activities performed with the data. Before objects within the database can be created, you must create the database and understand how to change the settings and the configuration of the database. This includes tasks such as expanding or shrinking the database, or specifying the files used to create the database.

In Database Objects:

apr_ - APplication Role - A SQL Server role created to support the security needs of an application.

def_ - User DEFault - Defaults specify what values are used in a column if you do not specify a value for the column when inserting a row. Defaults can be anything that evaluates to a constant.

dgm_ - User DiaGraM -

idc_ - InDex Clustered - An index in which the logical order of the key values determines the physical order of the corresponding rows in a table.

idx_ - InDeX - In a relational database, a database object that provides fast access to data in the rows of a table, based on key values. Indexes can also enforce uniqueness on the rows in a table. SQL Server supports clustered and nonclustered indexes. The primary key of a table is automatically indexed. In full-text search, a full-text index stores information about significant words and their location within a given column.

rul_ - User RULe - A database object that is bound to columns or user-defined data types, and specifies which data values are acceptable in a column. CHECK constraints provide the same functionality and are preferred because they are in the SQL-92 standard.

tbl_ - User TaBLe - A two-dimensional object, consisting of rows and columns, used to store data in a relational database. Each table stores information about one of the types of objects modeled by the database.

trg_ - User TRiGger - A stored procedure that executes when data in a specified table is modified. Triggers are often created to enforce referential integrity or consistency among logically related data in different tables.

udf_ - User Defined Function - In SQL Server, a Transact-SQL function defined by a user. Functions encapsulate frequently performed logic in a named entity that can be called by Transact-SQL statements instead of recoding the logic in each statement.

udt_ - User defined Data Type - A data type, based on a SQL Server data type, created by the user for custom data storage. Rules and defaults can be bound to user-defined data types (but not to system data types).

uro_ - User ROle - A SQL Server security account that is a collection of other security accounts that can be treated as a single unit when managing permissions. A role can contain SQL Server logins, other roles, and Windows logins or groups.

usp_ - User Stored Procedure - A precompiled collection of Transact-SQL statements stored under a name and processed as a unit. SQL Server supplies stored procedures for managing SQL Server and displaying information about databases and users. SQL Server-supplied stored procedures are called system stored procedures.

uvw_ - User VieW - A database object that can be referenced the same way as a table in SQL statements. Views are defined using a SELECT statement and are analogous to an object that contains the result set of this statement.

Data Types:

bin_ - BINary - Fixed-length data of n bytes. n must be a value from 1 through 8,000. Storage size is n+4 bytes.

bit_ - BIT - Integer data type 1, 0, or NULL.

chn_ - CHar uNsigned - Fixed-length Unicode character data of n characters. n must be a value from 1 through 4,000. Storage size is two times n bytes. The SQL-92 synonyms for nchar are national char and national character.

chr_ - CHaR - Fixed-length non-Unicode character data with length of n bytes. n must be a value from 1 through 8,000. Storage size is n bytes. The SQL-92 synonym for char is character.

dec_ - DECimal - Fixed precision and scale numbers. When maximum precision is used, valid values are from - 10^38 +1 through 10^38 - 1. The SQL-92 synonyms for decimal are dec and dec(p, s).

dtm_ - DateTiMe - Date and time data from January 1, 1753 through December 31, 9999, to an accuracy of one three-hundredth of a second (equivalent to 3.33 milliseconds or 0.00333 seconds).

dts_ - DateTime Small - Date and time data from January 1, 1900, through June 6, 2079, with accuracy to the minute. smalldatetime values with 29.998 seconds or lower are rounded down to the nearest minute; values with 29.999 seconds or higher are rounded up to the nearest minute.

flt_ - FLoaT - Is a floating point number data from - 1.79E + 308 through 1.79E + 308. n is the number of bits used to store the mantissa of the float number in scientific notation and thus dictates the precision and storage size. n must be a value from 1 through 53.

img_ - IMaGe - Variable-length binary data from 0 through 2³¹-1 (2,147,483,647) bytes.

inb_ - INt Big - Integer (whole number) data from -2^63 (-9223372036854775808) through 2^63-1 (9223372036854775807). Storage size is 8 bytes.

ins_ - INt Small - Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is 2 bytes.

int_ - INT - Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is integer.

iny_ - INt tinY - Integer data from 0 through 255. Storage size is 1 byte.

mns_ - MoNey Small - Monetary data values from - 214,748.3648 through +214,748.3647, with accuracy to a ten-thousandth of a monetary unit. Storage size is 4 bytes.

mny_ - MoNeY - Monetary data values from -2^63 (-922,337,203,685,477.5808) through 2^63 - 1 (+922,337,203,685,477.5807), with accuracy to a ten-thousandth of a monetary unit. Storage size is 8 bytes.

num_ - NUMeric - Fixed precision and scale numbers. When maximum precision is used, valid values are from - 10^38 +1 through 10^38 - 1. The SQL-92 synonyms for decimal are dec and dec(p, s).

rel_ - REaL - Floating point number data from –3.40E + 38 through 3.40E + 38. Storage size is 4 bytes. In SQL Server, the synonym for real is float(24).

tst_ - TimeSTamp - timestamp is a data type that exposes automatically generated binary numbers, which are guaranteed to be unique within a database. timestamp is used typically as a mechanism for version-stamping table rows. The storage size is 8 bytes.

txn_ - TeXt uNsigned - Variable-length Unicode data with a maximum length of 230 - 1 (1,073,741,823) characters. Storage size, in bytes, is two times the number of characters entered. The SQL-92 synonym for ntext is national text.

txt_ - TeXT - Variable-length non-Unicode data in the code page of the server and with a maximum length of 2³¹-1 (2,147,483,647) characters. When the server code page uses double-byte characters, the storage is still 2,147,483,647 bytes. Depending on the character string, the storage size may be less than 2,147,483,647 bytes.

uid_ - Unique IDentifier - A globally unique identifier (GUID).

var_ - VARiant - A data type that stores values of various SQL Server-supported data types, except text, ntext, image, timestamp, and sql_variant. sql_variant may be used in columns, parameters, variables, and return values of user-defined functions. sql_variant allows these database objects to support values of other data types.

vby_ - VarBinarY - Variable-length binary data of n bytes. n must be a value from 1 through 8,000. Storage size is the actual length of the data entered + 4 bytes, not n bytes. The data entered can be 0 bytes in length. The SQL-92 synonym for varbinary is binary varying.

vch_ - VarCHar - Variable-length non-Unicode character data with length of n bytes. n must be a value from 1 through 8,000. Storage size is the actual length in bytes of the data entered, not n bytes. The data entered can be 0 characters in length. The SQL-92 synonyms for varchar are char varying or character varying.

vcn_ - VarChar uNsigned - Variable-length Unicode character data of n characters. n must be a value from 1 through 4,000. Storage size, in bytes, is two times the number of characters entered. The data entered can be 0 characters in length. The SQL-92 synonyms for nvarchar are national char varying and national character varying.

Other Objects:

cur_ - CURsor - An entity that maps over a result set and establishes a position on a single row within the result set. After the cursor is positioned on a row, operations can be performed on that row, or on a block of rows starting at that position. The most common operation is to fetch (retrieve) the current row or block of rows.

lsv_ - Linked SerVer - A definition of an OLE DB data source used by SQL Server 2000 distributed queries. The linked server definition specifies the OLE DB provider required to access the data, and includes enough addressing information for the OLE DB provider to connect to the data. Any rowsets exposed by the OLE DB data source can then be referenced as tables, called linked tables, in SQL Server 2000 distributed queries.

dts_ - Data Transformation Services package - An organized collection of connections, Data Transformation Services (DTS) tasks, DTS transformations, and workflow constraints defined by the DTS object model and assembled either with a DTS tool or programmatically.

job_ - sql server agent JOB - A specified series of operations, called steps, performed sequentially by SQL Server Agent.

Real World Disaster Recovery

This happened back in June 2006 but I recently posted it to the Healthcare Technology Alliance and figured I would post it (and update it) here as well.

I had the misfortune to be on the third week of my job as the CTO of a major physician practice when our only data center was wiped out. Our fire suppression vendor had brought in a trainee who accidently hit the button without disabling the live system. The system itself was designed for warehouses not data centers and released a fine powder that mixed with the water vapor in the room to form caustic cement on our equipment. I opened up one server and every bit of copper had turned green. To my horror, the only copy of our disaster recovery procedure was stored on a file server in the data center and the previous nights tapes had not been taken offsite.
I held a quick training session for everyone from temp help desk to IT director on how to clean servers. Ordered a bulk next day shipment of servers and replacement SAN. Explained our situation to our vendors and asked for help. Called our sister institutions and explained our situation. Arranged for a professional disaster recovery cleaning service to come in and clean the room. Talked to executive management and gave them the real world recovery scenario and timelines so they could plan their groups.
Four major events expedited our recovery. Dell (they really came through for us) sent us pretty much every spare part they had in their Texas depots and a couple techs to help with the recovery even though our service agreements didn’t cover it. A sister institution was able to loan us a core switch. We were able to use Acronis to take images of servers as we brought them online and restore them to different (clean) hardware (and then use that server for parts on the next). The IT department from top to bottom really pulled together and worked for 3 days with little sleep to get things online.
We were able to bring all revenue cycle online within 48 hours and were completely back online within 72 hours.

Lessons learned:
-Know your data center. Everything: power consumption, heat load, air conditioning, fire suppression systems, UPS, wet/dry pipes, condition of the roof. Don’t let facilities, your architect or engineer tell you what you need. Check and make sure these components are really up to your needs.
-Disaster recovery procedures should be stored and kept up to date in multiple safe locations.
-Backups and procedures around them (such as taking offsite) need to be audited to ensure they are restorable and done in a consistent manner.
-Most equipment service agreements (even platinum) do not cover act of god. Insurance does but you will not get an immediate payout. Make sure you have enough capital sitting around to make purchases in a disaster scenario.
-Core equipment (big iron) is generally not available for retail purchase on a next day basis. Your service agreement doesn’t cover act of god so you won’t be able to get it from depot. If you absolutely cannot survive without that piece of equipment, buy two and store one offsite.
-Getting an exact duplicate (down to the component level) of a commodity server is generally not possible. Invest in a product (like Acronis) that can restore a backup to non-like hardware.
-Make sure your IT disaster recovery plan is mirrored by an organizational disaster recovery plan. The business should have documented communication, employee placement and documentation methods for unplanned downtime. They should also have a procedure for getting up to date when IT systems become available.
-If you receive spare parts out of the kindness of a vendors heart, make sure you document which ones you use and store them in a way that what you don’t use can be returned.
-Know the financial impact of downtime on your organization. It was very easy to justify building our (and successfully testing) DR hot site for our critical systems when we did our post mortem and realized we had lost slightly over a half million dollars in revenue.
-Keep your head... In a disaster sometimes it makes sense to take risks just make sure they are extremely well calculated ones.

Fannie Mae Security Breach

Article

Slashdot Coverage

District of Maryland Complaint

So a Unix scripting consultant with root access to 4000 of Fannie Mae's servers (somewhat doubtful that was 4000 Unix servers, but maybe the script was cross platform) inserts code into a SAN connectivity monitoring script in order to embarrass Fannie Mae and wipe their data on the day he is let go.

Pertinent info (this all happens on October 24th), he was let go at 2:30 pm, started working on the "virus" at 2:53 pm, last known access was at 4:30pm and he returned his laptop at 4:45 pm. "Late in the evening" his access was terminated.

On October 29th a senior Unix administrator finds the "virus" by accident.

The "virus" was set to go off on January 31st.

Wow just wow... He had a little less than a 2 hour window to conceive of, write and implement his "virus". So either he had started ahead of time, he is a pretty solid scripter or it wasn't very well written/proofed code. I tend to believe options 1 or 2 as his code did manage to run for 5 days without raising any alarms and he couldn't have had a lot of testing time.

He started working on it 23 minutes after he was let go and completed it 97 minutes later.

Lessons learned?

-For people with this level of access (or pretty much any tech worker) their access should be revoked during their exit interview.

-After their exit interview they should be escorted at all times and certainly should not be allowed to finish out the day.

-For an organization this size (4000 servers) it makes sense to put this stuff in version control and do occasional audits between version control and the production environment.

-Any senior administrator worth their salt (which it appears they had) is going to check the logs and make sure this guy didn't do anything suspicious in that 97 minutes. Presumably he didn't spend a whole lot of time saying goodbye...

-Wouldn't have helped in this case. But people should only have the absolute minimum level of access that they need to perform their jobs efficiently.

Tuesday, January 27, 2009

Cloud Computing

There has been much buzz about cloud computing as of late to the point where i've spent a little time looking into the technology and some of the real world implementations like the Amazon Elastic Compute Cloud EC2. I'll use this posting to collect and grow my thoughts on the subject.

What is the difference between "cloud" and "grid" computing?:

-Grid computing is a grouping of computers using common software to share processing load. Cloud computing is is similar in that it shares a load across infrastructure but different in that it does not require a common software package, the infrastructure "lives" on the internet and is presented as a server or series of servers to the end user. For example a computer "grid" could include an application installed on an individuals PC that is computing and phoning home, versus a computer "cloud" is entirely internet based.

What are the advantages of the cloud?:

-Presumably the reduction in actual IT resource costs. Specifically, the cloud vendor takes care of backups, procurement, monitoring, hardware failures basically all the day to day IT stuff outside of the application itself. This should allow for smaller IT departments since these functions are handled. It does not replace the need for a knowledgeable person to set up special configuration within servers and install and manage applications. The cloud will also be available anywhere with the big pipes of the cloud vendor supporting it.

What are the disadvantages of the cloud?:

-Still need folks to set up and maintain the applications even if not the day to day tasks. Pricing is dynamic (at least for EC2) depending on what platform you require and bandwidth usage. Security... Can you safely store regulated data (HIPAA, 21cfrPart11, Sarbanes Oxley) in the cloud? Can you reliably audit access to the cloud even from technicians at the cloud vendor? If your data is all in the cloud you may need big pipes of your own to access it.

Thoughts?:

I would probably be willing to experiment with cloud computing for non-mission critical, non-security sensitive (from a regulatory perspective) applications. I wouldn't consider moving my core infrastructure to the cloud at the moment but i will certainly be watching.

Questions?:

-How do you span single sign on/integrated authentication across locally installed and cloud installed servers? Particularly when you are paying per bit on the wire.

Michael J. Walter's Blog