The Art of Scalability
2019-03-31 09:35:19 2 举报
AI智能生成
读书笔记 The art of scalability
作者其他创作
大纲/内容
Staffing a scalable organization
a tool for defining responsibilities(RASIC)
Responsible
Accountable/Approver
Supportive
Consulted
Informed
organization structure
team size
factors
Bussiness needs
Managerial Duties
Team members tenure
manager experience
warning signs
too big
poor communication
lowered productivity
poor morale
too small
disgruntled bussiness partners
micromanaging managers
overworked team members
team structure
functional organization
matrix organization
leadership
leadership is influencing the behavior of an organization or a person to accomplish a specific objective(pulling method)
mission first,people always
vision(future)
vivid description of ideal future
important to shareholder value creation
measurable
inspirational
incorporate elements of your beliefs
mostly static,but modified as the need presents itself
easily remembered
mission(path)
descriptive of current state and actions
a sense of purpose
measurable
general direction or path toward the vision
goals(SMART)
specific
measurable
attainable but aggressive
realistic
timely or contain a component of time
casual roadmap to success
help employees understand how they contribute to those goals and how the employee aids in the creation of shareholder value
management
judicious and ehtical use of means to accomplish an end
project and task management
decompose a goal into component parts
determine relations of those parts
assignment of ownership with dates
measurement of progress to those dates
people and organization management
seeding
feeding
weeding
measurement
availability
response time
engineering productivity and efficiency
cost
quality
pave the path to success
goal tree
business case
there is chasm between technologist and other business leaders
technologist must take responsibility for crossing over into the business in order to bridge the chasm
initiatives must be put int terms the business leaders can understand
calculating the cost of outages and downtime can be an effective method of demonstrating the need for business culture focused on scalability
Building Processes for Scale
process
they augment the management of our teams and employees
they standardize employees' actions while performing repetitive tasks
they free employees up from daily mundane decisions to concentrate on grander, more creative ideas
processes, such as application design or problem resolution, are critical part of scaling an application
determining to implement any process at all is first step
determining the optimal amount of process
migrating from small to large through periodic changes
let the teams decide on the right amount
maintenance of process is also critical to ensure organizations do not outgrow processes
poor fit between the organization and process
culture clash
bureaucracy
incident & problem
incidents issues in our production environment
incident management is the process focused on timely and cost-effective restoration of service
D:detect
R:report
I:investigate
E:escalate
R:resolve
define ahead how much time should be allow to collect data and what data should be collected
incident life cycle
open
resolved
closed
problem life cycle
open
dentified
closed
problem are cause of incidents
problem management is the process focused on determining the cause of and correcting problem
daily incident meeting
quarterly incident review
postmortem
Managing crisis & Escalations
measure of order:resolution,escalation,communication process
escalation & status communication should be define during crisis
crisis postmortems
controlling change in production environments
change identification is a lightweight process
extract time of the change
system undergoing change
actual change
expected results of the change
contact information of person making the change
change management
proposed
system
expectd result
how change is to be performed
known risks
known dependencies
relationships to other system
approved
scheduled
implemented and logged
similar to change identification
validate as successful
reviewed and reported on over time
review of the change process
Determining Headroom
Headroom calculation checklist
identify major components
assign responsibility for determining actual usage and maximun capacity
growth rate
determine intrinsic or natural growth rate
determine business activity based growth rate
determine peak season affects
estimate headroom or capacity reclaimed by infrastructure projects
make headroom calculation
headroom = (ideal usage percentage*Maximum capacity) - current usage - growth + optimization
time remaining = headroom / (growth rate + seasonality - optimization)
architectural principles
be developed from goals and be aligned to vision and mission
AKF 12 principles
Availability
N+1 design
Design for rollback
Design to be disabled
Design for multiple live sites (S)
Cost
Buy when non core
Commodity hardware
Use Mature technologies (A)
Scalability
Design for at least two axes
Design to be Monitored (Availability)
Asynchronous Design (A, C)
Stateless Systems (A, C)
Scale out not up (A,C)
Joint Architecture Design
checklist
assign participants
software engineer
architect
operations engineer
database admin
system admin
network admin
product manager
project manager
quality assurance engineer
schedule sessions:divide sessions by component:database,server,cace,storage,etc
start session by covering specifications
review the architecture principles related to session's component
brainstorm approaches
list pros and cons of each approach
write down the ideas
arrive at consensus for design
create final document of design
Architecture Review Board
board constituency
leaders
well respected
expertise in some area of the application or architecure
purpose
to ensure the design that support the business needs including scalability
Buy versus Build
Focus on cost
Focus on strategy
are we the best providers
provide sustainable competitive differentiation
Merging Cost and strategy
does this component create strategic competitive differentiation
are we the best owners of this component or asset
what is the competition to this component
can we build this component cost effectively
determining the risk
assess risk
gut feeling
traffic light
failure mode and effect analysis
determine the proper level of granularity to assess the risk
choose an appropriate scoring scal(1,3,9)
likelihood of failure occurring(1=low,3=medium,9=high)
severity if failure occurs(1=minimal,3=significant,9=extreme)
ability to detect should failure occur(1=easy,3=medium,9=difficult
remediation actions & revised risk score
managing risk
risk type
acute risk: instance change of such as a release or a maintenance procedure
overall risk:watching and administering the total level of risk in the system at any point of time
adoptions of rules
adoptions of rules for specified predetermined amounts of risk
escalation path for the critical crisis
risk process must be uesed and followed
make sure it is a good fit for the organization
performance and stress test
performance test: identify,document eliminate bottlenecks
criteria:establish what criterias are expected for the application,components
environment:make sure test environment is close to production environment
define tests: find the 20% of test that will provide you 80% information
excute test
analyze data
report to engineers
repeat test & analysis
stress test:determine the behavior in abnormal loads
identify objectives
establish baseline
behavior during failure and recovery
behavior during loss of resources
how failure of one service affect the entire system
identify key services
determine load
enviroment
identify monitor
create load
execute test
analyze data
impact scalability through the areas of headroom,change control and risk management
barrier conditions and rollback
barrier conditions
go/no-go process exists to isolate faults early in your development life cycle
methods: architecture review board,code reviews, performance testing, productions measurements
rollback capabilities
an insurance policy for your business,shareholders and customers
markdown capabilities
design to be disabled
fast or right
tradeoffs in bussiness
project triangle
methods
gut feel
pro/con comparison
decision matrix
Architecing Scalable Solutions
designing fo any technology
architectures are not implementations
TAD:Technology agnostic design
increasing the number of options and competitors
lowering switching costs
reduction of cost to scale and risk associated with scale
implementing TAD
cultural initiatives
creating fault isolative architectual structures
a failure in the swim lane is not propagated
swin lanes go in the direction of and never across transaction flow
introduction to the AKF scale cube
x-axis:cloning of service and data with absolutely no bias
y-axis:a seperation of work responsibility by data and transaction
z-axis:split biased by the requestor and customer
spliting applications for scale
Three Axis Split Example
splitting database for scale
AKF Database Scale Cube
caching for performance and scale
asynchronous design for scale
Solving Other Issues and Challenges
too much data
clounds and grids
soaring in the clouds
plugging in the grid
monitoring application
planning data centers
putting it all together
0 条评论
下一页