Southwest Airlines Dec. 2022, a crisis even agile can't solve
- Todd Kromann

- Jan 2, 2023
- 17 min read
.... and, there are adaptive synergies that provide answers.
this starts with - how might we resolve the root causes, immediately, near term, and long term.
To be clear, there is not an agile magic wand for this. There is no magic wand of any name. Let's start with the basics. This storm was wonderful in teaching the failures of the Southwest system. Fortunately, the people of Southwest Airlines are wonderful and their are techniques to help them fix southwest. And, these are beyond agile. We recommend, that Southwest keep focused on the culture, it's marvelous.
Focus on the 'agile' in your Efficiency values
Don’t make the easy hard
Keep costs low
Stay agile -- redefine for beyond agile as below:
Start with these videos
Open Letter to Bob Jordan, CEO
Dear Bob,
I wanted to reach out and share with you some thoughts on the recent disruptions and cancellations at Southwest Airlines. As a fellow CEO, I understand the pressure to keep operations running smoothly and the impact that disruptions can have on our businesses and customers.
Here are some potential root causes and solutions that could help address the issues at hand:
Outdated technology: It appears that Southwest's scheduling software is unable to handle the demands of the current operations, resulting in manual scheduling efforts and a delicate "house of cards" that can quickly fall apart when disruptions occur. One solution could be to invest in modernizing or updating the technology to better handle the demands of Southwest's operations.
Limited staffing: The triple pandemic of Covid-19, the flu, and RSV has likely had an impact on staffing levels, making it more difficult for Southwest to adjust to increased system stress. Additionally, the FAA's strict regulations on flight crew work hours may be limiting the ability to recover from delays or disruptions. One solution could be to offer incentives to employees to work during major travel periods, as other airlines have done, or to review and potentially adjust FAA regulations to allow for more flexibility in recovery efforts.
Insufficient customer service: Customers have reported difficulty getting through to Southwest's customer service lines to rebook flights or find lost baggage. This could be due to a lack of personnel or inadequate training for handling high levels of call volume. One solution could be to increase staffing levels in customer service or to provide additional training to better handle high call volumes.
Aggressive scheduling: Southwest's schedule includes shorter flights with tighter turnaround times, which may be contributing to bottlenecks in the system. One solution could be to review and potentially adjust the scheduling to allow for more buffer time between flights to allow for a smoother flow of operations.
In addition to these solutions, there are also agile and lean techniques that could be implemented to help improve overall agility and resilience at Southwest. These include adopting an open space approach to change, utilizing invitation-based change methods, implementing decision-making frameworks, utilizing liberating structures, and implementing low or no cost leadership practices.
I believe that by addressing these root causes and implementing these solutions, Southwest can not only recover from the recent disruptions, but also improve overall business agility and customer satisfaction. I would be happy to discuss these ideas further and see how our team at Open Agile Solutions can support Southwest in implementing these changes.
Sincerely,
Todd Kromann, beyond agile coach -
Solution - beyond agile
Open Agile Solutions is excited to present a comprehensive approach to addressing the ongoing operational issues at Southwest Airlines. As a company that values innovation and customer satisfaction, we believe our solution will not only resolve the immediate challenges, but also set Southwest up for long-term success.
The Situation:
Southwest has faced significant disruptions in the past week, including the cancellation of over 2,500 flights due to a winter storm and the subsequent ripple effects of outdated technology and limited staffing. Customers have struggled to get through to customer service lines and rebook flights, and employees have reported difficulty communicating with the airline.
The Needs:
In order to address these issues and get Southwest back on track, there are a few key needs that must be addressed.
First and foremost, the outdated technology must be updated in order to handle the demands of current operations.
This includes not only the scheduling software, but also the phone and computer systems used by employees.
Additionally, Southwest must address the issue of limited staffing, whether through incentives for employees to work during peak travel periods or by finding other solutions to ensure sufficient staff is available to handle increased system stress.
Finally, improvements must be made to customer service in order to ensure that stranded passengers are able to rebook flights and find lost baggage in a timely and efficient manner.
The Solutions:
In order to address these needs, we recommend a two-pronged approach.
In the near term, we recommend implementing a number of agile manual practices and workarounds in order to improve efficiency and communication within the organization. This includes the use of open space, invitation-based change, decision frameworks, and liberating structures, as well as low or no cost leadership suggestions such as delegating decision-making and empowering teams.
In the mid to long term, we recommend a comprehensive technology upgrade, including the implementation of modern scheduling software and the adoption of agile and lean principles throughout the organization. This will not only improve efficiency and communication, but also foster a culture of continuous improvement and innovation.
The Benefits:
The benefits of this approach are numerous.
In the short term, implementing agile manual practices and workarounds will allow Southwest to quickly and effectively address the immediate issues of limited staffing and outdated technology.
In the long term, the adoption of agile and lean principles and the implementation of modern technology will not only improve efficiency and communication, but also lead to increased customer satisfaction and improved financial performance.
In conclusion, we believe that the open agile solution approach outlined above is the best way to address the ongoing operational issues at Southwest Airlines. By addressing the immediate needs and implementing long-term solutions, we are confident that Southwest will be able to overcome these challenges and emerge stronger than ever before.
Implications
Open Agile Solution Plan
Over the past few days, Southwest Airlines has faced significant challenges due to a combination of bad luck and inadequate planning. A winter storm hit Chicago and Denver, two of Southwest's biggest hubs, just as the so-called tripledemic of Covid-19, the flu, and RSV surged across the country, leaving many staff members sick and unable to adjust to increased system stress. Additionally, Southwest's schedule includes shorter flights with tight turnaround times, which may be contributing to bottlenecks in the system. The company has also struggled with limited staffing and inadequate customer service, as customers have reported difficulty getting through to the airline's customer service lines to rebook flights or find lost baggage.
Open Agile Solutions recognizes that Southwest is already utilizing agile, scrum, and kanban methodologies, but we believe there are additional techniques that could help the airline quickly recover from these challenges. In the near term, we recommend implementing the following manual practices and workarounds:
Open Space: Hold an open space session with all employees to identify and prioritize challenges and potential solutions.
Invitation-Based Change: Invite a diverse group of employees to join a change team and work together to implement solutions.
Decision Frameworks: Use decision frameworks such as RAPID or DECIDE to help the team make informed and effective decisions.
Liberating Structures: Use liberating structures such as 1-2-4-All or Triz to encourage innovative thinking and collaboration.
Low or No Cost Leadership: Encourage leaders at all levels to adopt a low or no cost leadership mindset as Host and empower their teams to take ownership of their work. Immediately break all rules; but set guardrails for team types. Leverage teal culture models; but go slow, with boundaries. Tap into chatgpt for leadership, there's a model for this we can detail
In the mid-term, our 90-day plan includes the following actions:
Upgrade Technology: Invest in modernizing and updating the technology used for flight bookings and employee placement. Think AI. Current systems are brittle. Go antifragile with the business processes.
Increase Staffing: Increase staffing levels to better handle high levels of call volume and system stress.
Training: Provide customer service training to ensure that employees are equipped to handle high levels of call volume and customer inquiries.
Review Scheduling: Review the current schedule and make adjustments as needed to reduce bottlenecks and improve efficiency.
Collaboration: Foster collaboration and open communication between departments to improve overall operations.
In the long term, our one-year plan includes the following actions:
Continuous Improvement: Establish a continuous improvement culture by regularly reviewing and adjusting processes and systems.
Employee Engagement: Increase employee engagement through regular check-ins, training, and opportunities for growth and development.
Customer Experience: Enhance the customer experience by prioritizing customer needs and providing
How might we measure success?
Objectives and Key Results for southwest Q1 2023
Improve Southwest's scheduling software to handle the demands of current operations and reduce the need for manual scheduling efforts. Key results:
Implement new scheduling software within 90 days
Achieve 95% accuracy in automated scheduling within 90 days
Reduce manual scheduling efforts by 75% within 90 days
Increase staffing levels to improve ability to adjust to increased system stress and recover from delays or disruptions. Key results:
Increase staffing levels by 10% within 90 days
Achieve 95% compliance with FAA regulations on flight crew work hours within 90 days
Improve recovery from disruptions by 25% within 90 days
Enhance customer service by increasing personnel and improving training for handling high levels of call volume. Key results:
Increase customer service personnel by 20% within 90 days
Achieve 95% satisfaction rate for customer service within 90 days
Reduce call wait times by 50% within 90 days
Review and adjust scheduling practices to reduce bottlenecks in the system. Key results:
Review and adjust scheduling practices within 90 days
Achieve 95% on-time performance within 90 days
Reduce cancellations by 50% within 90 days
AI recommended plan
o fully resolve the issues outlined in the OKR and achieve the key results within 90 days, we will need to implement a number of innovative solutions. Here are some specific recommendations:
Leverage AI technologies to improve scheduling and reduce manual efforts: We will implement an AI-powered scheduling system that can handle the demands of current operations and significantly reduce the need for manual scheduling efforts. This will improve accuracy and efficiency, and free up staff to focus on other tasks.
Increase staffing levels and improve training: By increasing staffing levels and providing additional training to customer service personnel, we can improve our ability to handle high levels of call volume and increase customer satisfaction. We will also ensure that all staff are well-trained in FAA regulations on flight crew work hours, to improve compliance and reduce disruptions.
To fully resolve the OKR's within 90 days, it would be necessary to implement a comprehensive training program for staff. This program should focus on the specific skills and abilities needed to improve customer satisfaction, reduce flight cancellations and lost baggage, and increase on-time departures.
One way to achieve this would be to leverage AI technologies, such as machine learning and natural language processing, to provide real-time analysis of customer feedback and flight data. This would allow staff to quickly identify and address problem areas, such as bottlenecks in the system or areas where customer satisfaction is low.
In addition, it would be helpful to utilize liberating structures, such as open space technology and human laboratory, to facilitate collaboration and idea-sharing among staff. This would allow for the co-creation of solutions to challenges faced by the company, and would help to foster a sense of ownership and empowerment among staff.
Finally, it would be important to align this training program with Southwest's values, such as a commitment to safety and customer satisfaction. By incorporating these values into the training program, staff will be better equipped to uphold them in their daily work and create a positive, customer-focused culture at the company.
Review and adjust scheduling practices: By reviewing and adjusting our scheduling practices, we can identify and resolve bottlenecks in the system that are causing delays and cancellations. This will help us achieve higher levels of on-time performance and reduce cancellations.
Use liberating structures and teal cultures: To facilitate collaboration and innovation, we will adopt a number of liberating structures and teal culture practices. These include open space technology, world cafe, and sociocracy, which will help us tap into the collective wisdom and creativity of our team.
Leverage Southwest's values: As a company, Southwest has a strong commitment to customer service and employee satisfaction. We will leverage these values in our plan by focusing on improving the customer experience and increasing employee satisfaction. This will help us create a positive, supportive culture that is aligned with our company's mission and vision.
Detailed plan
Week 1:
Consult with software engineers and experts in airline scheduling systems to assess the current state of Southwest's systems, including SkySolver, Crew Web Access, and SkySYM.
Identify the root causes of the recent system failures, including any technical issues or limitations with the current systems, and assess the feasibility of implementing new software or technology to address these issues.
Engage with Southwest's IT team and management to assess the resources and support available for implementing new solutions, and to establish a timeline and budget for addressing the identified issues.
Week 2:
Develop a plan for implementing new software or technology to address the identified root causes of the system failures, including the integration of new scheduling and network simulation systems, such as Optym's SkyMAX and SkySYM.
Work with Southwest's IT team to ensure that the new solutions are compatible with the existing systems and infrastructure, and to develop a plan for rolling out the new technology to all relevant departments and personnel.
Engage with airline industry experts and technology vendors to identify best practices and potential partnerships for optimizing the new systems and improving overall operational efficiency.
Week 3:
Begin the process of implementing the new scheduling and simulation systems, including any necessary training and support for relevant personnel.
Work with Southwest's IT team to ensure that the new systems are properly integrated and tested, and to identify and address any issues or challenges that arise.
Engage with relevant departments and personnel to gather feedback and input on the new systems, and to identify any additional opportunities for improvement or optimization.
Week 4:
Continue the process of implementing and testing the new scheduling and simulation systems, including any necessary updates or adjustments based on feedback and testing results.
Engage with Southwest's management and IT team to assess the progress and impact of the new systems, and to identify any additional steps or resources needed to fully optimize their performance.
Begin the process of rolling out solutions use AI, AI code generated, and adaptive agile coaches Monitor and track progress on key results, including on-time departures, customer satisfaction, and employee satisfaction.
Continue training and implementation of new scheduling software, with a focus on achieving 95% accuracy in automated scheduling.
Further optimize routes and reduce the need for manual scheduling efforts using AI technology.
90 Days:
Achieve 95% accuracy in automated scheduling and reduce manual scheduling efforts by 75% using the new scheduling software and AI technology.
Increase staffing levels by 10% and achieve 95% compliance with FAA regulations on flight crew work hours.
Increase customer service personnel by 20% and achieve a 95% satisfaction rate for customer service, with a 50% reduction in call wait times.
Review and adjust scheduling practices to reduce cancellations by 50% and achieve 95% on-time performance.
AI Analysis of Initial Plan
Overall, the likelihood of achieving these key results within 90 days is high if the following actions are taken:
Implement a new scheduling software within the first week of the plan. This software should be chosen based on its ability to handle the demands of current operations and reduce the need for manual scheduling efforts.
Increase staffing levels by 10% within the first week of the plan. This will improve the ability to adjust to increased system stress and recover from delays or disruptions.
Increase customer service personnel by 20% within the first week of the plan. This will enhance customer service by increasing personnel and improving training for handling high levels of call volume.
Review and adjust scheduling practices within the first week of the plan. This will help to reduce bottlenecks in the system and improve on-time performance.
Leverage AI technologies, such as machine learning and natural language processing, to improve the accuracy and efficiency of scheduling and customer service processes.
Utilize liberating structures and teal cultures to empower employees and foster collaboration and innovation within the organization.
Align the plan with Southwest's values of customer service and efficiency, and regularly track progress and make adjustments as needed to ensure that the key results are achieved.
By taking these actions, the likelihood of achieving the key results within 90 days is high. However, it is important to note that there may be unforeseen challenges or obstacles that could impact the success of the plan. It will be important to remain flexible and adaptable in order to overcome any challenges that may arise.
DETAILED POSSIBLE CODING OUTPUTS FROM AI
I can provide general information about how one might go about gathering a dataset for a machine learning model.
To gather a dataset for a machine learning model, one would typically need to define the specific features or variables that will be used as inputs to the model. These features could include things like the type of system, the time of day, and the specific error messages that were generated.
Once the features have been defined, the next step would be to collect data on these features for a large number of past system failures. This could be done through a variety of methods, such as manually reviewing log files or using automated tools to extract data from various sources.
Once the data has been collected, it will need to be cleaned and preprocessed to ensure that it is in a format that can be used by the machine learning model. This might involve tasks such as removing missing or invalid data, converting data into a numeric format, or normalizing the data to ensure that all features are on a similar scale.
Finally, the cleaned and preprocessed data can be used to train the machine learning model, using techniques such as supervised learning or unsupervised learning. The goal of this training process is to find patterns in the data that will allow the model to accurately predict future system failures based on the input features.
LIKELY BASIC AI MODEL
import pandas as pd # Load in dataset of past system failure data failure_data = pd.read_csv('failure_data.csv') # Extract relevant features for training the model features = ['system_type', 'time_of_day', 'error_message'] X = failure_data[features] # Extract target variable y = failure_data['failure'] # Split dataset into training and testing setsfrom sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train machine learning model using XGBoostimport xgboost as xgb xgb_model = xgb.XGBClassifier(objective='binary:logistic', random_state=42) xgb_model.fit(X_train, y_train) # Evaluate model performance on test setfrom sklearn.metrics import accuracy_score y_pred = xgb_model.predict(X_test) print('Accuracy: ', accuracy_score(y_test, y_pred)) # Save trained model for later useimport joblib joblib.dump(xgb_model, 'failure_prediction_model.pkl')
what would be in 'failure_data.csv' . write example data as-if gathered from southwest in relation to the data needed for the crisis
failure_data.csv
system,time_of_day,error_message scheduling,morning,system overloaded with too many requests scheduling,afternoon,system crashed due to unknown error network simulation,evening,database connection lost network simulation,morning,maintenance activities caused system outage scheduling,evening,system unable to process changes made by employees network simulation,afternoon,system unable to process weather data scheduling,morning,system unable to access crew information network simulation,evening,system unable to process ATC events scheduling,afternoon,system unable to process passenger information network simulation,morning,system unable to process maintenance activities network simulation,evening,system unable to process aircraft flows scheduling,afternoon,system unable to process crew delays network simulation,morning,system unable to process passenger flows
contiue this as-if, format as a table
Here is an example of the data that could be in the 'failure_data.csv' file:
SystemTime of DayError MessageScheduling12:00 PMConnection timeoutNetwork Simulation5:00 PMOut of memory errorScheduling8:00 AMNull reference exceptionNetwork Simulation11:00 PMDivision by zero errorScheduling3:00 PMDisk space fullNetwork Simulation7:00 AMInvalid input
It is difficult to determine which specific machine learning model would be the most useful for this type of dataset without more information about the specific characteristics and patterns in the data. Some potential models that could be considered for analyzing this type of data include decision trees, random forests, gradient boosting, and neural networks. These models all have the ability to process large amounts of data and identify patterns and trends, but the best model for a specific dataset will depend on the specific characteristics and patterns present in the data.
Here is an example of how you could use a machine learning model to analyze the failure data: First, we will need to import the necessary libraries and load the failure data into a Pandas DataFrame: Copy code import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier failure_data = pd.read_csv('failure_data.csv') Next, we will need to prepare the data for training. This could involve selecting the relevant features and labels, handling missing or invalid data, and splitting the data into training and testing sets: Copy code X = failure_data.drop(['failure', 'error_message'], axis=1) y = failure_data['failure'] X = pd.get_dummies(X) X = X.fillna(X.mean()) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) Now we can train a Random Forest classifier on the training data: Copy code classifier = RandomForestClassifier(n_estimators=100, random_state=42) classifier.fit(X_train, y_train) Finally, we can evaluate the model's performance on the testing data: Copy code y_pred = classifier.predict(X_test) accuracy = classifier.score(X_test, y_test) print('Accuracy:', accuracy) This is just one example of how you could use a machine learning model to analyze the failure data. There are many other options and techniques that you could consider, depending on the specific needs and goals of your analysis. It would be helpful to also consider additional factors that might have an impact on the reliability of Southwest's flight schedules, such as maintenance issues, delays caused by airport infrastructure, and the impact of external events such as severe weather or geopolitical tensions. Additionally, it would be useful to gather data on the performance of Southwest's competitors in order to benchmark the company's performance and identify areas for improvement. In order to effectively train a machine learning model on this dataset, it may also be necessary to perform feature engineering in order to extract relevant features and ensure that the data is properly formatted for input to the model. Finally, it will be important to select an appropriate evaluation metric in order to measure the performance of the model and determine whether it is able to accurately predict the root causes of system failures.
Sure, here is a script for training a machine learning model on the failure data dataset using a random forest classifier: Copy code import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split # Load the failure data into a pandas DataFrame failure_data = pd.read_csv('failure_data.csv') # Select the features to use for training features = ['system', 'time_of_day', 'error_message'] X = failure_data[features] # Select the target variable y = failure_data['failure'] # Split the data into training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Initialize the random forest classifier clf = RandomForestClassifier(n_estimators=100, random_state=42) # Fit the model to the training data clf.fit(X_train, y_train) # Make predictions on the test data predictions = clf.predict(X_test) # Calculate the accuracy of the model accuracy = clf.score(X_test, y_test) print(f'Model accuracy: {accuracy:.2f}') This script first imports the necessary libraries and loads the failure data into a pandas DataFrame. Then, it selects the features to use for training (in this case, the 'system', 'time_of_day', and 'error_message' columns) and the target variable (the 'failure' column). The data is then split into training and test sets using the train_test_split function. Next, the script initializes a random forest classifier and fits it to the training data using the fit method. It then makes predictions on the test data using the predict method and calculates the accuracy of the model using the `
Once the data has been collected and cleaned, the next step would be to train a machine learning model on the dataset. This could involve using techniques such as supervised learning, unsupervised learning, or reinforcement learning, depending on the specific goals of the project and the characteristics of the dataset. For example, if the goal is to predict future system failures based on past data, a supervised learning approach might be used, where the model is trained on labeled data (e.g. past system failures) and then used to make predictions on new, unseen data. On the other hand, if the goal is to identify patterns or trends in the data that might not be immediately apparent, an unsupervised learning approach might be more appropriate, where the model is not given any labeled data and is instead used to cluster or group the data based on similarities. Once the model has been trained, it can then be tested and evaluated to ensure that it is performing accurately and effectively. If necessary, the model can be fine-tuned or modified to improve its performance. After the model has been trained and tested, it can then be deployed in production, where it can be used to analyze real-time data and identify potential issues or problems before they occur. This could involve integrating the model into the existing systems and processes at Southwest Airlines, or creating a standalone application that can be accessed by relevant staff.
Here is an example of how the code for integrating the machine learning model into the existing systems and processes at Southwest Airlines might look like in Python: Copy code # Import necessary librariesimport pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier # Load the failure data into a pandas DataFrame failure_data = pd.read_csv('failure_data.csv') # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(failure_data.drop('failure', axis=1), failure_data['failure'], test_size=0.2, random_state=42) # Train the random forest classifier on the training data clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train) # Test the model on the testing data predictions = clf.predict(X_test) # Evaluate the model's performance accuracy = accuracy_score(y_test, predictions) print(f'Model accuracy: {accuracy*100:.2f}%') # Save the trained model to a filewith open('failure_model.pkl', 'wb') as f: pickle.dump(clf, f) # Integrate the model into the existing systems and processes at Southwest Airlines# Load the trained model from the filewith open('failure_model.pkl', 'rb') as f: clf = pickle.load(f) # Define a function to predict the likelihood of a system failure based on input datadef predict_failure(
WITH THIS APPROACH WE CAN CONTINUE TO ITERATE WITH AI AN SOUTHWEST to a rapid solution combining human performance engineering, psychology, and AI systems analysis and coding. This is a set of techniques well beyond agile in speed and resolution ability.


Comments