How does one manage a schedule when no one has any idea how long things should take.
//==================================================
This case recently came up: what if the tech director is too busy to supervise everyone and the programmers are all too busy to report on their progress such that missed deadlines are reported properly?
This is an emergency situation and is guaranteed to release the product late. However, there are multiple strategies that a producer can do to mitigate the disaster... especially if the producer has little or no domain knowledge.
Before I discuss those, let's talk about what information must be reported and gathered before anything can be done effectively.
Reporting and gathering:
You have already blown a sprint or two discovering that your crew is fairly junior and that the tech director is too busy to adequately help estimate. Plus, estimating the work of juniors is fraught with absurdity and inadequacy. You need to figure out what your velocity looks like and what is impeding development. We have lots of techniques to come to increase your velocity, but right now measuring is important.
The first question you should ask is can your juniors even perform the work. If your team consists of Javascript programmers and you are building a WebApp, then you may be in alright shape. However, you may be building embedded controllers: then you are in trouble. If you find yourself in the second situation, you must inform your management immediately that you will probably never finish the task. This is a situation where being hopeful that the JS programmers can learn the tech will NOT pay off. You should treat this situation with extreme prejudice and pessimism. People nearly always head into these situations believing that it will "all just work out" or that "with proper mgmt", that you should be able to 'project manage' this into success. This is a classic mgmt technique to make you feel inadequate if you can't properly manage people who are clearly not up to he task and you should not fall for the trap; you are a pro and your professional opinion should matter.
Now, having one qualified junior on a team where the rest are not still spells disaster. You cannot complete this task and neither can the team. You really need a modicum of skill. This is another evaluation that must be made with prejudice. Juniors always think that they are good according to the Dunning-Kruger effect. You really need outside help. This is where outside teams or hiring comes in. Talk to outside management or other groups to find out the availability of engineering talent. This often goes nowhere simply because labor will not be available. This can mean hiring temporary experts, bringing in people part time from other groups,
Once you have settled that, you need to identify who has which skills on the team and how to have the team members share that knowledge with each other.
Seek immediate instruction to raise the average ability of the team.
Techniques to reduce risk:
1) Resident expert brought in
2) Tasks are broken down into minutiae. The tasks must be absolutely clear for more junior people. Experienced people can deal with
3) Check in with production every 3 hours ... destructive
5) Hire consultant
6) Pair programming
7) Specialized training
8) Scrum and estimation training
Estimation itself
1) All overages are due to bad estimation... when the programmer says
only another hour or two, you should immediately treat this as a lost
day.
2) All estimates should be doubled ... at least. 4x is a better track for the beginning until you better understand the ability of the programmers to estimate.
Reporting to mgmt
1)
Engineering and Science
Cover software engineering, engineering science, physics, linguistics, and other interesting topics.
Tuesday, March 31, 2015
Saturday, March 14, 2015
Software engineering schedule estimation
Intro
What is the biggest concern of studio heads and producers? Labor pool and finding good talent. But after that, usually schedule is the most important. This is strongly related to budget and engineering uncertainty. Basically, having decent estimates, even in an agile schedule, is essential.
But squeezing good estimates out of programmers is difficult. The Dunning-Kruger effect almost guarantees that programmers will overestimate their ability to give good time estimates. Those who are good will give +50% estimates (150% of a good guess) and those who are terrible will always underestimate the tasks in order to "get the job". What should a producer or scrum master do?
Well, if you have any historical knowledge of the programmers involved, then you can use this as a tool for guiding future estimates by programmers with less-than-perfect scheduling history. Also, a word or two before every sprint planning session ( or work estimate bidding) can work in your favor to clear the air and remove any pressure to "get the job" by underestimating. Lastly, breaking tasks into simpler tasks that are achievable and demonstrable is key. I will cover all of these in some detail below.
Firstly, estimation is a difficult thing and engineers have historically had dubious results. There is an entire wiki page dedicated to this topic. http://en.wikipedia.org/wiki/Software_development_effort_estimation
Quoting from that page, this captures the essence of the problem.
Secondly, estimation can be overdone spending entirely too long mitigating possible risk and not enough time working. For software and engineers who have a good track record of estimation, you can simplify your effort. Take that programmer's best estimate and multiply by 150%... the additional 50% will cover nearly all overages. You will be wrong on occasion, but you will usually be under schedule and finish early. Then both you and the programmer will look good historically. Another way of putting this is that people will trust you and your estimates. For all estimates, always assume a 6 hour work day.
But for those who have a track record of badly estimating, make sure to involve other engineers, ask hard questions (covered later), and then double that estimate. Those who are bad at estimating are also often bad with distractions and time, so they may take longer. If they demonstrate that they are better than the double-estimate over time, then you can reduce that number to 150% and maybe remove the need for the opinion of other experienced programmers.
Thirdly, estimating big tasks without decomposing them is impossible. Without decomposing the problem, saying that it will take 18 months to build a car from scratch is at best a wild guess (and wrong) and at worst, if you've made promises to deliver, likely to cause you a lot of pressure and overtime. Do yourself a favor, and make sure that all time estimates are under 5 days, and ideally, 3 days and under.
Asking the hard questions:
People tend to shy away from the hard questions and they stick to "how long will it take?" This question, by itself, is not quite useless... but it's not very helpful. For any given task, you should be able to ask at least the following questions:
Is this your expertise?
Do you need to look anything up or can you get to work immediately?
Do you have things that will prevent you from working like subsystems that are undone, existing bugs, etc?
Is there any integration with other systems? Do you know which systems those are?
Who else needs to help you?
Can you work in an isolated sandbox? If not, how likely are you to break the build?
Of any of the other people that you need to contact, will those people be available? Vacations? Sick time?
When you commit your code, will it be done or will there be multiple checkins?
Assuming that you are distracted by other things, how much time could this take?
When you have an engineer who historically is bad with estimation, or maybe you are new to a project and have no history on the people involved, you should use this handy guide for guessing the amount of work. Notable is that all of these estimates are high. We are not interested in exact estimates and estimates that are too low do us all a disservice. We shoot a little high until we have a better sense of the estimation ability of the engineers involved.
Given that producers generally have little insight into software development tasks, use this as an estimation guide:
Now add the following for additional unknowns:
So, those rough guidelines should give you an idea on the real world numbers to use for estimating tasks. Of course, given expertise in a particular area and an expert engineer, these numbers can be reduced... by about 50%
Dealing with uncertainty
Programmers are usually sure of themselves. Even the very experienced, good ones, are not good at telling you when they are not sure of something. This leaves product owners and scrum masters in a bit of a dilemma. The bigger issue is that designing a reasonable schedule and providing good estimates becomes a crap shoot when everyone offers wildly optimistic or pessimistic viewpoints on the effort required.
This is where the effort bidding comes in. During sprint planning, developers are supposed to look at feature and estimate effort. This should include programmers, QA, and possibly producers. Unfortunately, this is the part that programmers hate the most, and is the most useful tool in decomposition and effort assessment for scrum masters (producers). I believe that the reasons that programmers hate it so much is: it's time consuming, it feels like random guessing averaged, programmers are forced to justify their positions, and it's something that is not like programming.
Whatever the reasons, tearing apart a feature into tasks and then into subtasks, assigning those, adding QA tasks, identifying dependencies, and building a schedule is the most sane way to fix the schedule and increase software development processes and reduce effort and rework due to unforeseen dependencies.
What to do when you get it wrong
You will get it wrong. Occasionally, a task will be misestimated and you will overestimate the effort. It is rare that you overestimate, but make sure that if the teams does, that you call them out, try to understand what happened, and acknowledge the great work of the engineer; this is usually appreciated.
But managing perceptions is what you do before you get it wrong... talked about below. Once you've missed, the following steps should be taken.
Your last thing that you should do is inform the major stakeholders as soon as you can. Sending an email is helpful, or even a quick meeting to verify that your mitigation strategy is correct, and that the feature is still important.
Managing perceptions
This may be the trickiest part. Overestimating the length of feature development may mean that the feature is cut because it appears too costly. Underestimating it may also mean the same thing since it is likely that the feature will not work correctly and much later put the project at risk.
So, describing the risk of a feature is tough. This becomes much easier if the task is broken down into much more manageable pieces given that only a few of the pieces are likely to be over and with enough slack in the schedule, may not affect the overall delivery schedule at all. Still, overages do occur and these need to be managed in order to reduce pressure on engineers (leading to mistakes) and to help mgmt understand what a great job everyone is doing which leads to perceived success and company happiness. In addition, a well-managed schedule has enough slack that people have enough free-thinking time to properly evaluate engineering needs, pursue alternate solutions, and debug difficult problems.
Also, with enough smaller tasks used to describe a large feature, an overage is much easier to explain and to accept from your boss when a 3-day task surprisingly takes 5 days instead.
Even with all of these pieces, it is a good idea to identify and call out features that are large and inherently riskier. If you have a feature that breaks down into 20 tasks, then it is likely that some tasks were missed, some tasks will run over, and the feature will be delayed. Other things like engineer fatigue, sick days, and surprise tasks will cause changes to the schedule and further delay the delivery of larger features. Calling these riskier features out to mgmt during alignment meetings, casual talks over the coffee machine, and hallway conversations keep other managers aware. Also, telling the engineers involved during sprint planning meetings shows a level of understanding, maturity, and mgmt expertise creating a more collaborative atmosphere when attempting to work with major unknowns.
Another strategy is colouring tasks based on engineer assessment. A simple question like "what is the chance that this might be over you 3-day estimate that you gave on a scale of 1 to 3... 1 being none and 3 being likely." This is a simple scale, it doesn't take much thought, and allows you to color code tasks in your backlog. You can also adjust your turndown, or allow more time for riskier sprints. This kind of communication takes a little getting used to, but allows everyone to know that somethings are harder to know up front and may be delayed.
Possible conversation:
Bill: "So how long will it take to integrate that 3D library?"
Steve: "about 2 weeks."
Bill: "Alright, let's break this up. We want a lot of smaller task to help us deliver stuff so that mgmt will be happy with our progress and to communicate better with the team. What are the first steps?"
Steve: "Well, I have to look over the docs. That will take a few hours."
Bill: "That usually takes a long time... let me give you a day. You don't need to use the whole day to do that, bu you can use a few hours here and there as you need. Consider it an open tab at a bar. Then what's next?"
Steve: "Then I need to find the files and put them into our project and start integrating."
Bill: "Whoa... too many 'ands'. Let's break that up. How long to find the files?"
Steve: "an hour or two... I need to find the download site and make sure that I have the right version, pull it down, etc."
Bill: "Actually, that sounds pretty easy... is 3 hours about right?"
Steve: "That's way too much.. I'll only only need about 1 1/2 hours. Tops."
Bill: "Nice, but what if there are Linux and Windows versions of the libs? What if some compile only in 64 bit? What if the library requires an older version of Visual Studio? Let's leave it at 3 hours, and if you don't use it all, then we have given you a little slack. On a big task like this, you will probably need extra wiggle room, not less."
Bill: "... so then you said that putting the files into the project was next... that means adding those to Perforce too, right?"
Steve: "Yeah... but I should be able to dump the files in and just compile. That should only take a few minutes."
Bill: "Doesn't this need to compile in Linux too?"
Steve: "Yeah... so I'll need to update the make file on the Linux side and the project in Visual Studio... still that's pretty easy."
Bill: "Aren't there usually compile issues when you first integrate a new library? Then include paths because those libs may not know where boost is and other kinds of issues? And you still need to get that into Perforce.... let's give you 3 hours for that. We're up to two days before the integration truly starts."
Steve: "That seems too high."
Bill: "Well, if you finish the full integration early, then you can work on other stuff that you think that we need to do, but you never have time for. Weren't you going to cleanup that Lua library? If you finish early, you can do that."
...
What is the biggest concern of studio heads and producers? Labor pool and finding good talent. But after that, usually schedule is the most important. This is strongly related to budget and engineering uncertainty. Basically, having decent estimates, even in an agile schedule, is essential.
But squeezing good estimates out of programmers is difficult. The Dunning-Kruger effect almost guarantees that programmers will overestimate their ability to give good time estimates. Those who are good will give +50% estimates (150% of a good guess) and those who are terrible will always underestimate the tasks in order to "get the job". What should a producer or scrum master do?
Well, if you have any historical knowledge of the programmers involved, then you can use this as a tool for guiding future estimates by programmers with less-than-perfect scheduling history. Also, a word or two before every sprint planning session ( or work estimate bidding) can work in your favor to clear the air and remove any pressure to "get the job" by underestimating. Lastly, breaking tasks into simpler tasks that are achievable and demonstrable is key. I will cover all of these in some detail below.
Firstly, estimation is a difficult thing and engineers have historically had dubious results. There is an entire wiki page dedicated to this topic. http://en.wikipedia.org/wiki/Software_development_effort_estimation
Quoting from that page, this captures the essence of the problem.
- It's easy to estimate what you know.
- It's hard to estimate what you know you don't know.
- It's very hard to estimate things that you don't know you don't know.
Secondly, estimation can be overdone spending entirely too long mitigating possible risk and not enough time working. For software and engineers who have a good track record of estimation, you can simplify your effort. Take that programmer's best estimate and multiply by 150%... the additional 50% will cover nearly all overages. You will be wrong on occasion, but you will usually be under schedule and finish early. Then both you and the programmer will look good historically. Another way of putting this is that people will trust you and your estimates. For all estimates, always assume a 6 hour work day.
But for those who have a track record of badly estimating, make sure to involve other engineers, ask hard questions (covered later), and then double that estimate. Those who are bad at estimating are also often bad with distractions and time, so they may take longer. If they demonstrate that they are better than the double-estimate over time, then you can reduce that number to 150% and maybe remove the need for the opinion of other experienced programmers.
Thirdly, estimating big tasks without decomposing them is impossible. Without decomposing the problem, saying that it will take 18 months to build a car from scratch is at best a wild guess (and wrong) and at worst, if you've made promises to deliver, likely to cause you a lot of pressure and overtime. Do yourself a favor, and make sure that all time estimates are under 5 days, and ideally, 3 days and under.
Asking the hard questions:
People tend to shy away from the hard questions and they stick to "how long will it take?" This question, by itself, is not quite useless... but it's not very helpful. For any given task, you should be able to ask at least the following questions:
Is this your expertise?
Do you need to look anything up or can you get to work immediately?
Do you have things that will prevent you from working like subsystems that are undone, existing bugs, etc?
Is there any integration with other systems? Do you know which systems those are?
Who else needs to help you?
Can you work in an isolated sandbox? If not, how likely are you to break the build?
Of any of the other people that you need to contact, will those people be available? Vacations? Sick time?
When you commit your code, will it be done or will there be multiple checkins?
Assuming that you are distracted by other things, how much time could this take?
When you have an engineer who historically is bad with estimation, or maybe you are new to a project and have no history on the people involved, you should use this handy guide for guessing the amount of work. Notable is that all of these estimates are high. We are not interested in exact estimates and estimates that are too low do us all a disservice. We shoot a little high until we have a better sense of the estimation ability of the engineers involved.
Given that producers generally have little insight into software development tasks, use this as an estimation guide:
- Small task: 1 day
- "one hour task": 3 hours
- Easy task: 2 days
- Medium task: 3-5 days, when in doubt, use 5
- Large task: don't bother to estimate. Break it up into smaller deliverables that can be estimated at less than 5 days.
Now add the following for additional unknowns:
- Working with an external library + 5 days
- Working with a previously unused external library + 10 days
- Integration into a memory system + 5 days
- Integration into an existing graphics system + 5 days
- Integration into a 3D graphics system + 10-15 days
- New tool chain + 5 days
- New complex tool chain + 15 days
- Conversion of code to another language + 3 weeks minimum
So, those rough guidelines should give you an idea on the real world numbers to use for estimating tasks. Of course, given expertise in a particular area and an expert engineer, these numbers can be reduced... by about 50%
Dealing with uncertainty
Programmers are usually sure of themselves. Even the very experienced, good ones, are not good at telling you when they are not sure of something. This leaves product owners and scrum masters in a bit of a dilemma. The bigger issue is that designing a reasonable schedule and providing good estimates becomes a crap shoot when everyone offers wildly optimistic or pessimistic viewpoints on the effort required.
This is where the effort bidding comes in. During sprint planning, developers are supposed to look at feature and estimate effort. This should include programmers, QA, and possibly producers. Unfortunately, this is the part that programmers hate the most, and is the most useful tool in decomposition and effort assessment for scrum masters (producers). I believe that the reasons that programmers hate it so much is: it's time consuming, it feels like random guessing averaged, programmers are forced to justify their positions, and it's something that is not like programming.
Whatever the reasons, tearing apart a feature into tasks and then into subtasks, assigning those, adding QA tasks, identifying dependencies, and building a schedule is the most sane way to fix the schedule and increase software development processes and reduce effort and rework due to unforeseen dependencies.
What to do when you get it wrong
You will get it wrong. Occasionally, a task will be misestimated and you will overestimate the effort. It is rare that you overestimate, but make sure that if the teams does, that you call them out, try to understand what happened, and acknowledge the great work of the engineer; this is usually appreciated.
But managing perceptions is what you do before you get it wrong... talked about below. Once you've missed, the following steps should be taken.
- Make the hard choice if the work should be continued, reassigned, or dropped altogether. This is usually based on the importance of the feature.
- Estimate the work remaining, if possible.
- Discover a mitigation strategy. e.g. Bringing in a domain expert for a consult.
- Put that task into the backlog.
- Wait until the next sprint planning session to try again.
Your last thing that you should do is inform the major stakeholders as soon as you can. Sending an email is helpful, or even a quick meeting to verify that your mitigation strategy is correct, and that the feature is still important.
Managing perceptions
This may be the trickiest part. Overestimating the length of feature development may mean that the feature is cut because it appears too costly. Underestimating it may also mean the same thing since it is likely that the feature will not work correctly and much later put the project at risk.
So, describing the risk of a feature is tough. This becomes much easier if the task is broken down into much more manageable pieces given that only a few of the pieces are likely to be over and with enough slack in the schedule, may not affect the overall delivery schedule at all. Still, overages do occur and these need to be managed in order to reduce pressure on engineers (leading to mistakes) and to help mgmt understand what a great job everyone is doing which leads to perceived success and company happiness. In addition, a well-managed schedule has enough slack that people have enough free-thinking time to properly evaluate engineering needs, pursue alternate solutions, and debug difficult problems.
Also, with enough smaller tasks used to describe a large feature, an overage is much easier to explain and to accept from your boss when a 3-day task surprisingly takes 5 days instead.
Even with all of these pieces, it is a good idea to identify and call out features that are large and inherently riskier. If you have a feature that breaks down into 20 tasks, then it is likely that some tasks were missed, some tasks will run over, and the feature will be delayed. Other things like engineer fatigue, sick days, and surprise tasks will cause changes to the schedule and further delay the delivery of larger features. Calling these riskier features out to mgmt during alignment meetings, casual talks over the coffee machine, and hallway conversations keep other managers aware. Also, telling the engineers involved during sprint planning meetings shows a level of understanding, maturity, and mgmt expertise creating a more collaborative atmosphere when attempting to work with major unknowns.
Another strategy is colouring tasks based on engineer assessment. A simple question like "what is the chance that this might be over you 3-day estimate that you gave on a scale of 1 to 3... 1 being none and 3 being likely." This is a simple scale, it doesn't take much thought, and allows you to color code tasks in your backlog. You can also adjust your turndown, or allow more time for riskier sprints. This kind of communication takes a little getting used to, but allows everyone to know that somethings are harder to know up front and may be delayed.
Possible conversation:
Bill: "So how long will it take to integrate that 3D library?"
Steve: "about 2 weeks."
Bill: "Alright, let's break this up. We want a lot of smaller task to help us deliver stuff so that mgmt will be happy with our progress and to communicate better with the team. What are the first steps?"
Steve: "Well, I have to look over the docs. That will take a few hours."
Bill: "That usually takes a long time... let me give you a day. You don't need to use the whole day to do that, bu you can use a few hours here and there as you need. Consider it an open tab at a bar. Then what's next?"
Steve: "Then I need to find the files and put them into our project and start integrating."
Bill: "Whoa... too many 'ands'. Let's break that up. How long to find the files?"
Steve: "an hour or two... I need to find the download site and make sure that I have the right version, pull it down, etc."
Bill: "Actually, that sounds pretty easy... is 3 hours about right?"
Steve: "That's way too much.. I'll only only need about 1 1/2 hours. Tops."
Bill: "Nice, but what if there are Linux and Windows versions of the libs? What if some compile only in 64 bit? What if the library requires an older version of Visual Studio? Let's leave it at 3 hours, and if you don't use it all, then we have given you a little slack. On a big task like this, you will probably need extra wiggle room, not less."
Bill: "... so then you said that putting the files into the project was next... that means adding those to Perforce too, right?"
Steve: "Yeah... but I should be able to dump the files in and just compile. That should only take a few minutes."
Bill: "Doesn't this need to compile in Linux too?"
Steve: "Yeah... so I'll need to update the make file on the Linux side and the project in Visual Studio... still that's pretty easy."
Bill: "Aren't there usually compile issues when you first integrate a new library? Then include paths because those libs may not know where boost is and other kinds of issues? And you still need to get that into Perforce.... let's give you 3 hours for that. We're up to two days before the integration truly starts."
Steve: "That seems too high."
Bill: "Well, if you finish the full integration early, then you can work on other stuff that you think that we need to do, but you never have time for. Weren't you going to cleanup that Lua library? If you finish early, you can do that."
...
Wednesday, March 5, 2014
Designing a secure gaming network
What does secure mean? For most people this means that a site or server that cannot be hacked.
In May 2013, one of the largest Banks in America (Bank of America) with the one of the most secure banking systems in the US was hacked. The hackers took over ATMs in NYC and in just 4 hours withdrew 20 million dollars from user accounts. This is just a few months ago... this is supposed to be security at its best. In July 2013, Apple was hacked and their developer accounts were compromised possibly exposing all developer accounts. Their email (which I received) said that all developer accounts were encrypted but that they were going to reset all developer accounts "just in case."
For the BofA atack, there are some minor caveats like the hackers had internal information, but that hardly matters... how did the hackers get in? The hackers having internal info should not have allowed this. At Apple, no excuse was offered but how did this happen? This is a huge deal but what about that encryption... does it matter?
The big boys who have enormous resources and fabulous teams of anti-hacker developers still manage to be hacked. If they are hacked, what can a lowly game developer hope to do against the onslaught of hackers out there. So, before you get upset, there is a ton that you can do to prevent the very thing that caused so many problems for Apple and BofA. It comes down to architecture.
Before I begin, let me say for the record: you must make an effort to thwart hackers to prevent the casual hacker or lazy programmer from getting into your system. That said, I promise you: you cannot build a better security system than BofA. The trick is to limit the damage and to detect the intrusion. The rule is: Build the best system possible but also realize that hackers will break into your system eventually. This is where a good architecture matters.
Any place where external users can attack your systems, or just use your systems, you have a potential threat and a vulnerability. Most people look at game network systems as being something like this:
Each part of this diagram has a ton of vulnerabilities. The phone or PC can be hacked, the users can snoop on their PC and then replay the data traffic to the server, a man-in-the-middle can listen to someone's network traffic and try to pretend that they are making purchases, someone can attack the server directly sending bad packets or good-looking data that is meant to take over your sever and give them access to your database... and on and on. Where do we server programmers have any control? Only at the server level? Not so fast... we have far more control than that.
We can control the client app, we can hash passwords before they are sent and never store passwords on the server (you store the hash so if hackers get the DB, they still can't use the stolen accounts ), the communications can be controlled and if at any time we detect abuse, we can shut down an abusive connection, etc. All of these are potential places to prevent hacking (and optimize).
The biggest problem with the above diagram? The direct access of the server to the DB. In fact, this is so flawed, that you are practically giving hackers free access to the db and you will never detect an intrusion when it happens because most likely, they hackers will take over the server using buffer overflow and then simply copy everything on your server so that they can look at it during leisure time. They tend to target specific database items but when they don't know what they are doing, they may just copy everything.
The communication protocol should be small and will be the topic of another blog post.
For any internet facing process, I prefer to store nothing like a password file with the exe. My type of security is to write an exe that:
1) Can be launched remotely
2) Can be isolated in a VM or on it's own hardware
3) Has no DB access
4) Opens a socket on launch and takes commands like "listen on port 3400" and "forward all packets to ipaddress 10.1.34.192".
5) Isolate that machine/VM in such a way that when hackers break into it (they always do) that they have no configuration data, no local files, no db access, etc. This prevents them from knowing anything about your systems.
6) Keep a monitor application that checks to make sure your app is still running and when you app comes down, relaunch, send parameters for configuration to your app, and no worries...
They hacked your system, but you can rest assured that even with buffer overflow and if they manage to send your exe back to themselves, they won't know anything about the internal configuration of your systems since you app doesn't know those things.
No system is perfect, but this type of system is extremely difficult to take over the server, and when they do, this "gateway" type of app knows nothing so hackers don't get anything and they certainly have no access to user accounts.
Storing anything in a local directory leaves you vulnerable. If you want to do this, then unix/linux is the best way because you can lock down the file system to prevent hackers from looking at your files (Windows can be configured this way, but it's a lot harder to do).
So, for a local file, limit it's access to read only (you won't believe what they can do if they can overwrite your password file... it's a backend buffer-overflow where they overwrite it and when you decrypt it, your app crashes). Then put your app deep in a separate subdirectory that can access from your file. Then give your app limited permissions (definitely not root). Deep directories make it harder to get to /etc and other directories. Here is some good advice (jail kit stuff):
http://stackoverflow.com/questions/527876/how-to-restrict-a-linux-user-to-be-only-able-to-read-home-user-and-nothing-else
When they take over your server (and they will, make no mistake), make it very difficult to change directories, grep, etc. You don't want them to find anything. Anything that they find could become a vulnerability. Its best to start with a fresh install on a clean box and not install anything on it that isn't absolutely necessary. A VM is great for this because it's just a file. When you finish configuring the VM the way that you want... copy the VM file and expect some hacker to get into your system and completely destroy your VM. Then you can simply restore from backup (file copy... really). Should take about 30 seconds to bring your server back up.Whatever you do... do NOT store a list of ip connections, ports, or access to your DB. Never, never, never let your internet facing gateway process have access to the DB. Don't even include the DB Connector code and do not install the ODBC on that computer either. You do not to give hackers even a small chance of figuring out that you use SQL Server (MySql, Oracle, whatever) so the less info you have on that internet-facing box, the less damage they can do.
This is the current server architecture that I have built (a diagram I use internally) and the gateways run in a completely different VM. All of the games run on another box and cannot be accessed from the gateways except through internal routing. When someone hacks the gateway, user info is never exposed.
Another critical piece is the trapezoid that reads "Is user permitted". This is in the gateway, but for malformed packets, too many packets in a short time, buffer overflow, etc, I shut down the connection. Because we want hackers to think that they are being successful and not know that we spotted them, we put the socket on "ignore-mode" which means that it continues to receive and will shut down the socket after 30-130 packets later. That way, this increase the amount of work that a hacker has to do by at least an order of magnitude.
The other systems have a distinct protocol for talking to the gateway which is highly specific. This protocol is different than the protocol for talking to the client. When hackers do take over the gateway, they have no way to talk to the game servers, they have no access to the DB, they can only talk on specific ports and IP Addresses. This system is not unbeatable, but based on 20 years of networking security design, this is my latest and most secure.
Splitting your DB into sensitive data and common data can help too. If a hacker has to go to different DBs (schemas) to try and get user account info, this begins to become too hard to make things worth it. Splitting user account info (credit cards) and purchases is trivial in the DB World, but piecing those back together is very hard. Using an 'd' field is often trivial to replicate so I bind them using a "UUID" so that putting it all back together is next to undecipherable.
One last step I do is never, ever, send user db indices (id) back to the client. Way too many flaws in security are based on handing back to the user their DB index (e.g. user_id=42336). If a hacker gets a hold of your db through SQL injection, we don't want him to dump the user table by knowing the structure or using simple indices. For each user, I generate a unique identifier (UUID) that is sent back to the client. When they get a list of friends or other things, you only see these impossible to hack UUIDs. Hackers would have no idea how to use that information to look up user info. The gateway knows nothing about UUIDs either and so hacking our gateway reveals no user info at all.
Encrypting packets will come next
In May 2013, one of the largest Banks in America (Bank of America) with the one of the most secure banking systems in the US was hacked. The hackers took over ATMs in NYC and in just 4 hours withdrew 20 million dollars from user accounts. This is just a few months ago... this is supposed to be security at its best. In July 2013, Apple was hacked and their developer accounts were compromised possibly exposing all developer accounts. Their email (which I received) said that all developer accounts were encrypted but that they were going to reset all developer accounts "just in case."
For the BofA atack, there are some minor caveats like the hackers had internal information, but that hardly matters... how did the hackers get in? The hackers having internal info should not have allowed this. At Apple, no excuse was offered but how did this happen? This is a huge deal but what about that encryption... does it matter?
The big boys who have enormous resources and fabulous teams of anti-hacker developers still manage to be hacked. If they are hacked, what can a lowly game developer hope to do against the onslaught of hackers out there. So, before you get upset, there is a ton that you can do to prevent the very thing that caused so many problems for Apple and BofA. It comes down to architecture.
Before I begin, let me say for the record: you must make an effort to thwart hackers to prevent the casual hacker or lazy programmer from getting into your system. That said, I promise you: you cannot build a better security system than BofA. The trick is to limit the damage and to detect the intrusion. The rule is: Build the best system possible but also realize that hackers will break into your system eventually. This is where a good architecture matters.
Any place where external users can attack your systems, or just use your systems, you have a potential threat and a vulnerability. Most people look at game network systems as being something like this:
Each part of this diagram has a ton of vulnerabilities. The phone or PC can be hacked, the users can snoop on their PC and then replay the data traffic to the server, a man-in-the-middle can listen to someone's network traffic and try to pretend that they are making purchases, someone can attack the server directly sending bad packets or good-looking data that is meant to take over your sever and give them access to your database... and on and on. Where do we server programmers have any control? Only at the server level? Not so fast... we have far more control than that.
We can control the client app, we can hash passwords before they are sent and never store passwords on the server (you store the hash so if hackers get the DB, they still can't use the stolen accounts ), the communications can be controlled and if at any time we detect abuse, we can shut down an abusive connection, etc. All of these are potential places to prevent hacking (and optimize).
The biggest problem with the above diagram? The direct access of the server to the DB. In fact, this is so flawed, that you are practically giving hackers free access to the db and you will never detect an intrusion when it happens because most likely, they hackers will take over the server using buffer overflow and then simply copy everything on your server so that they can look at it during leisure time. They tend to target specific database items but when they don't know what they are doing, they may just copy everything.
The communication protocol should be small and will be the topic of another blog post.
For any internet facing process, I prefer to store nothing like a password file with the exe. My type of security is to write an exe that:
1) Can be launched remotely
2) Can be isolated in a VM or on it's own hardware
3) Has no DB access
4) Opens a socket on launch and takes commands like "listen on port 3400" and "forward all packets to ipaddress 10.1.34.192".
5) Isolate that machine/VM in such a way that when hackers break into it (they always do) that they have no configuration data, no local files, no db access, etc. This prevents them from knowing anything about your systems.
6) Keep a monitor application that checks to make sure your app is still running and when you app comes down, relaunch, send parameters for configuration to your app, and no worries...
They hacked your system, but you can rest assured that even with buffer overflow and if they manage to send your exe back to themselves, they won't know anything about the internal configuration of your systems since you app doesn't know those things.
No system is perfect, but this type of system is extremely difficult to take over the server, and when they do, this "gateway" type of app knows nothing so hackers don't get anything and they certainly have no access to user accounts.
Storing anything in a local directory leaves you vulnerable. If you want to do this, then unix/linux is the best way because you can lock down the file system to prevent hackers from looking at your files (Windows can be configured this way, but it's a lot harder to do).
So, for a local file, limit it's access to read only (you won't believe what they can do if they can overwrite your password file... it's a backend buffer-overflow where they overwrite it and when you decrypt it, your app crashes). Then put your app deep in a separate subdirectory that can access from your file. Then give your app limited permissions (definitely not root). Deep directories make it harder to get to /etc and other directories. Here is some good advice (jail kit stuff):
http://stackoverflow.com/questions/527876/how-to-restrict-a-linux-user-to-be-only-able-to-read-home-user-and-nothing-else
When they take over your server (and they will, make no mistake), make it very difficult to change directories, grep, etc. You don't want them to find anything. Anything that they find could become a vulnerability. Its best to start with a fresh install on a clean box and not install anything on it that isn't absolutely necessary. A VM is great for this because it's just a file. When you finish configuring the VM the way that you want... copy the VM file and expect some hacker to get into your system and completely destroy your VM. Then you can simply restore from backup (file copy... really). Should take about 30 seconds to bring your server back up.Whatever you do... do NOT store a list of ip connections, ports, or access to your DB. Never, never, never let your internet facing gateway process have access to the DB. Don't even include the DB Connector code and do not install the ODBC on that computer either. You do not to give hackers even a small chance of figuring out that you use SQL Server (MySql, Oracle, whatever) so the less info you have on that internet-facing box, the less damage they can do.
This is the current server architecture that I have built (a diagram I use internally) and the gateways run in a completely different VM. All of the games run on another box and cannot be accessed from the gateways except through internal routing. When someone hacks the gateway, user info is never exposed.
Another critical piece is the trapezoid that reads "Is user permitted". This is in the gateway, but for malformed packets, too many packets in a short time, buffer overflow, etc, I shut down the connection. Because we want hackers to think that they are being successful and not know that we spotted them, we put the socket on "ignore-mode" which means that it continues to receive and will shut down the socket after 30-130 packets later. That way, this increase the amount of work that a hacker has to do by at least an order of magnitude.
The other systems have a distinct protocol for talking to the gateway which is highly specific. This protocol is different than the protocol for talking to the client. When hackers do take over the gateway, they have no way to talk to the game servers, they have no access to the DB, they can only talk on specific ports and IP Addresses. This system is not unbeatable, but based on 20 years of networking security design, this is my latest and most secure.
Splitting your DB into sensitive data and common data can help too. If a hacker has to go to different DBs (schemas) to try and get user account info, this begins to become too hard to make things worth it. Splitting user account info (credit cards) and purchases is trivial in the DB World, but piecing those back together is very hard. Using an 'd' field is often trivial to replicate so I bind them using a "UUID" so that putting it all back together is next to undecipherable.
One last step I do is never, ever, send user db indices (id) back to the client. Way too many flaws in security are based on handing back to the user their DB index (e.g. user_id=42336). If a hacker gets a hold of your db through SQL injection, we don't want him to dump the user table by knowing the structure or using simple indices. For each user, I generate a unique identifier (UUID) that is sent back to the client. When they get a list of friends or other things, you only see these impossible to hack UUIDs. Hackers would have no idea how to use that information to look up user info. The gateway knows nothing about UUIDs either and so hacking our gateway reveals no user info at all.
Encrypting packets will come next
Monday, August 5, 2013
C++ using globals
The rule that you should almost never break: Never use globals.
Reasons:
1) Namespacing. Compilers are very good at telling you that a name is already taken. Once your application passes 10k lines of code, the number of namespace problems you will face increases exponentially. Global variables each need a unique name and the more you use, the harder it is to invent new ones.
2) Working with other people. As sure as there is a Sun, people on your project will misuse your global variables. Your global variable set to 0 will almost always mean something to one person and something else to someone else. Globals are notorious for causing fragility for this very reason.
3) Instancing and memory. A global variable is created before you ever hit main. Complex variable types will even invoke 'new' meaning that crashes can and do happen before you hit main. No debugger in the World will help you find that bug. You will never find it through debugging IOW. You'll have to be smarter than the compiler. Also, crashing on exit is another side-effect of globals. Most apps that have complex global variables are not able to clean up correctly and crash when exiting. These kinds of problems are ruthlessly difficult to solve.
4) Design. Relying on globals means that you are probably not a very good organizer. Code should be organized well and relying on globals means that you've run out of ideas and controlling how data is passed, initialized, stored. Look into better designs and your life becomes easier.
5) Performance. Globals are slower than local variables (stack) or member variables (usually part of the stack, but they are usually cached). Globals are stored in another part o the application that you have no control over, neither stack nor cache and this will rarely be cached and so will almost always be slower.
6) Address space. Globals are in a memory space that is in the same memory space as your application. If you have a string, for example, and you write past the end of your string (a common beginner mistake), you will actually overwrite the code itself. This can create some very interesting results.
7) Portability. Code that relies on globals, especially across modules, is not reusable... in general. I would even go as far as to say that you will never be able to use that code outside of that one project.
Reasons:
1) Namespacing. Compilers are very good at telling you that a name is already taken. Once your application passes 10k lines of code, the number of namespace problems you will face increases exponentially. Global variables each need a unique name and the more you use, the harder it is to invent new ones.
2) Working with other people. As sure as there is a Sun, people on your project will misuse your global variables. Your global variable set to 0 will almost always mean something to one person and something else to someone else. Globals are notorious for causing fragility for this very reason.
3) Instancing and memory. A global variable is created before you ever hit main. Complex variable types will even invoke 'new' meaning that crashes can and do happen before you hit main. No debugger in the World will help you find that bug. You will never find it through debugging IOW. You'll have to be smarter than the compiler. Also, crashing on exit is another side-effect of globals. Most apps that have complex global variables are not able to clean up correctly and crash when exiting. These kinds of problems are ruthlessly difficult to solve.
4) Design. Relying on globals means that you are probably not a very good organizer. Code should be organized well and relying on globals means that you've run out of ideas and controlling how data is passed, initialized, stored. Look into better designs and your life becomes easier.
5) Performance. Globals are slower than local variables (stack) or member variables (usually part of the stack, but they are usually cached). Globals are stored in another part o the application that you have no control over, neither stack nor cache and this will rarely be cached and so will almost always be slower.
6) Address space. Globals are in a memory space that is in the same memory space as your application. If you have a string, for example, and you write past the end of your string (a common beginner mistake), you will actually overwrite the code itself. This can create some very interesting results.
7) Portability. Code that relies on globals, especially across modules, is not reusable... in general. I would even go as far as to say that you will never be able to use that code outside of that one project.
Monday, December 5, 2011
Benefits of TDD
From Reddit.com:
Meta-analysis of over thirty studies found no consistent effect from TDD. One clear finding was that the better the study, the weaker the signal.
Greg Wilson's lecture: http://vimeo.com/9270320
and book http://www.amazon.com/Making-Software-Really-Works-Believe/dp/0596808321
Wilson's post about the subject: http://www.neverworkintheory.org/?p=139
Meta-analysis of over thirty studies found no consistent effect from TDD. One clear finding was that the better the study, the weaker the signal.
Greg Wilson's lecture: http://vimeo.com/9270320
and book http://www.amazon.com/Making-Software-Really-Works-Believe/dp/0596808321
Wilson's post about the subject: http://www.neverworkintheory.org/?p=139
I’m still not sure what to think about test-driven development. On the one hand, I feel that it helps me program better—and feel that strongly enough that I teach TDD in courses. On the other hand, studies like this one, and the other summarized in Erdogmus et al’s chapter in Making Software, seem to show that the benefits are illusory. That might mean that we’re measuring the wrong thing, but I’m still waiting for one of TDD’s advocates to say how we’d measure the right thing.
The benefits from TDD come in a few different forms:
1) Change. As designs change and evolve, test harnesses reflect the original design and should provide a consistent framework for validation. They may need to be adapted to major design changes.
2) Simple validation. Verifying that you code works... both in the test harness and the production code written.
3) Understanding. If you can create a strong test harness, you'll have a much better idea of what the flow of data is and how it will be consumed. Without this, you are only guessing.
4) Workflow. For mid-level and junior programmers, this provides a consistent workflow and helps build good habits for testing production code.
5) Speedier production. Often, you do not have access to the consumer of your interfaces. This allows you to work in a vacuum and people can refer to you harness when trying to understand how to use your code.
6) Automated test. Your harness can be part of you build and guarantee the status of many parts of your build without running your code.
For all of these reasons and more, TDD just works. I have worked on many failed teams that do not use TDD. I have worked on three teams who used TDD and all three shipped on time and under budget. I have never worked on a team that uses TDD that has failed.
Wednesday, October 19, 2011
How Does a Good Leader React to Crisis?
1) Remain calm.
Running around frantically or reacting to excited or worried people
creates the image that things are much worse. Remaining calm has a
calming effect on others and demonstrates that you are in charge of your
emotions. Along with this, when sending out emails, avoid emotionally
charged words and stick to facts:
Good example: "Today, we were informed that our margins have become razor thin and we are surviving by borrowing money to help us through this difficult time."
Bad example: "As you may have noticed, we bleeding money and our profitability is in the tank. We are going through some rough times and we see no end in sight so we are forced to go to bank to keep us afloat."
2) Find solutions to your crisis and focus on those. Avoid blaming people. Even though some people or divisions may have not been generating revenue for a while, they are generally trying to do so and blaming them reduces morale throughout the company. Some people may have actively torpedoed the company but instead of pointing the finger, focus on solutions. If you need to call out someone who deliberately hurt the company, call him/her out deliberately, calmly, and then explain how you are going to fix the problem; do not spend time talking about this person which detracts from your solutions.
3) Put together a fire-fighting team. These people will have one job and that is to implement your solutions or possibly to find solutions. Do not use the regular employees since they may have been part of the problem to begin with and they presumably already have other jobs; you need to find someone that will not affect the company too much if s/he is distracted with other work. These fire-fighters must be people that others trust and they must have a lot of authority over budgets, the ability to move people around, recommend removal of people (no firing authority), and absolute visibility to the entire company including you. They must also have flexibility to modify your plan if they think that it isn't working or won't work. Also, give these people a deadline, say three months, and a firm goal (or ten) like cut budget overruns by 70%.
Good leaders do not react -- they respond!
Good example: "Today, we were informed that our margins have become razor thin and we are surviving by borrowing money to help us through this difficult time."
Bad example: "As you may have noticed, we bleeding money and our profitability is in the tank. We are going through some rough times and we see no end in sight so we are forced to go to bank to keep us afloat."
2) Find solutions to your crisis and focus on those. Avoid blaming people. Even though some people or divisions may have not been generating revenue for a while, they are generally trying to do so and blaming them reduces morale throughout the company. Some people may have actively torpedoed the company but instead of pointing the finger, focus on solutions. If you need to call out someone who deliberately hurt the company, call him/her out deliberately, calmly, and then explain how you are going to fix the problem; do not spend time talking about this person which detracts from your solutions.
3) Put together a fire-fighting team. These people will have one job and that is to implement your solutions or possibly to find solutions. Do not use the regular employees since they may have been part of the problem to begin with and they presumably already have other jobs; you need to find someone that will not affect the company too much if s/he is distracted with other work. These fire-fighters must be people that others trust and they must have a lot of authority over budgets, the ability to move people around, recommend removal of people (no firing authority), and absolute visibility to the entire company including you. They must also have flexibility to modify your plan if they think that it isn't working or won't work. Also, give these people a deadline, say three months, and a firm goal (or ten) like cut budget overruns by 70%.
Good leaders do not react -- they respond!
Tuesday, October 11, 2011
Where will I ever need this math?
For a career, you need certain skills. There are six main
areas where mathematics is needed and a bunch of sub areas... ignoring math for maths sake or
teaching. Most of these people are highly respected and often are paid
highly.
1) Computer games, simulations, military simulators, etc. The level of math varies but usually requires at least two semesters of Calc, Linear Algebra, and some exposure to Vector Calc. I work in this area.
2) Economics, business, predictive economics, insurance. This usually finishes Calc and then goes into probability, statistics, and actuary (insurance).
3) Research, materials research, chemistry, heat transfer, nuclear plant building, solar cell creation. Most of these delve into a unique blend of statistics and differential equations.
4) Engineering, bridge building, aircraft design, process engineering, industrial design. This is almost entirely Calculus and applied physics.
5) Biological research, genetics, protein analysis, pharmaceutical research. This is applied chemistry, biological sciences, and computer simulations mixed with Abstract Algebra (protein folding), Calculus, and various fluid sciences (fluids is a very hard part of physics/maths).
6) Applied physics, nuclear research, grand unification, particle/string theorists. These are all post-graduate people and you had better study a lot of physics and have known Calculus for 20 years. Most of these people have advanced physics degrees and at least a masters in maths. This is DiffEQ, Lie Algebra, Symmetry, Geometry, and much much more.
1) Computer games, simulations, military simulators, etc. The level of math varies but usually requires at least two semesters of Calc, Linear Algebra, and some exposure to Vector Calc. I work in this area.
2) Economics, business, predictive economics, insurance. This usually finishes Calc and then goes into probability, statistics, and actuary (insurance).
3) Research, materials research, chemistry, heat transfer, nuclear plant building, solar cell creation. Most of these delve into a unique blend of statistics and differential equations.
4) Engineering, bridge building, aircraft design, process engineering, industrial design. This is almost entirely Calculus and applied physics.
5) Biological research, genetics, protein analysis, pharmaceutical research. This is applied chemistry, biological sciences, and computer simulations mixed with Abstract Algebra (protein folding), Calculus, and various fluid sciences (fluids is a very hard part of physics/maths).
6) Applied physics, nuclear research, grand unification, particle/string theorists. These are all post-graduate people and you had better study a lot of physics and have known Calculus for 20 years. Most of these people have advanced physics degrees and at least a masters in maths. This is DiffEQ, Lie Algebra, Symmetry, Geometry, and much much more.
Subscribe to:
Posts (Atom)