Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowchart; it’ll be obvious.
We’ve chosen to use relational databases, specifically PostgreSQL, for storing some of our data. We like ACID, we like the ease of ad-hoc query-ability and we like the fact that databases add an additional layer of security and data quality control. To make the most of this we should adopt some conventions, so that when we are accessing PG from an ORM, we don’t bring too many ORM-isms into our data model. This ensures that other staff, who might be using other ORMs, can still work with our data and it also prevents us from relying too much on the ORM or application layer for doing work that RDBMs can already do.
These are my preferred conventions, we will evolve them over time. I’ll try to provide a justification for each one, but this is a discussion.
- All names (table, column, sequence, index, constraint, role, etc.) should be lowercase with underscores. Postgres does support AnYSortOF casing that you’d like, but it makes manual querying painful.
- Table names should be a singular noun that describes one row. “account”, not “accounts”. Some people prefer plural, we just need a standard, my vote is for singular as it makes SQL a little more natural to read
e.g., “SELECT * FROM account WHERE account.balance > 5000;”.
- We’re using a relational database. Have relations. Very few tables should be islands.
- Foreign keys should be named “<table>_id”, e.g., if the “account” table links to the “person” table, there should be a column in “account” called “person_id”. In the case where there are multiple foreign keys to the same table, prefix the ids, e.g. “from_person_id” and “to_person_id”
- Foreign keys must have foreign key constraints. It makes the schema more readable, both by humans and introspection tools. It also prevents mistakes at the application layer.
- Serial columns should have the sequence as the default value for that column. E.g., if the “account” table has a primary key of “id”, it should be defined (in SQL) as “id SERIAL PRIMARY KEY”, which is a shortcut for “INTEGER NOT NULL DEFAULT nextval(‘account_id_seq’)”.
- Never expose serial columns outside of the model layer. If any table is going to be exposed in any way via an API, it should have a UUID column that will be exposed instead of using the “id”.
- Index, constraint and sequence names should take the form of table_column_[idx | uidx | seq | ck] for indexes, unique indexes, sequences and constraints.
- Unique indexes should encompass all the rules for uniqueness. If the “user” table can only have one copy of each user, consider a unique constraint on first_name, last_name, address and zip, also on SSN, or whatever. There is nothing wrong with having the front-end, back end and database all check this.
- Constraints should reflect business rules. Just because your application does sanity checking, it doesn’t mean that some bozo at the terminal will do it. FYI Josh has access to most of our machines and is a bozo.
- Postgres has a rich selection of native types (IP Addresses, UUIDs, Time intervals, Polygons). Use them where appropriate. If your data is an IP address, stick it in an INET. If it is a UUID, there’s a type for that.
- Postgres also supports enumerated types. If we have a relatively immutable small list of possible values for a column, use an enum.
- If the SQL type is not descriptive enough of the type of data that is stored in a column, use the units of measurement in the column. E.g., “height_meters” if we are storing a height, in meters. God knows why we’d do that, but you get the idea.
- Don’t be afraid of TEXT. If you want to store free form text, VARCHAR (2048) isn’t what you want. Postgres is smart enough to move large chunks of text outside of the table and into a blob, so there VARCHARs end up taking more space than TEXT. If there are strict length constraints, don’t use TEXT.
- Don’t be afraid of NUMERIC. We are dealing with money. Bigint’s are fine, but we need to then rely on the application layer to do the right thing. Each application needs to know what 12345 means in dollars. When we start having interest bearing accounts, 4 decimal places may not be enough. Postgres supports arbitrary precision numbers. We should standardize on NUMERIC (18,6) for money. And please be sure that your application doesn’t silently translate arbitrary precision numbers into IEEE-754 floats or similar. We all saw Superman III.
- Set reasonable DEFAULTs. If you have a column called “created” which records when a row was created, a reasonable default would be now(). I saw this as a default on one of our tables “not null default ”::character varying” Not reasonable. If its not supposed to be null, setting the default as ‘’ is silly. At the very least, decide whether each column should be NULL.
- Don’t be afraid of schemas. Postgres supports multiple object namespaces within the same database . If you’re unaware of schemas, you are probably creating objects in the “public” schema. If we ever get to a point where any database has dozens of tables, schemas are a good way to clarify the roles of each table. Look into it.
- By default, don’t denormalize. At our scale, its bad form to have the same column in two tables that are joined by a 1:1 relationship. Doing this means less logic at the application layer to enforce consistency.
- Many to many tables should be named with the name of the two tables they join.
- Log modifications. If tables have mutable columns, provide a _history table that keeps track of changes. If you’re so inclined, you can do this with triggers.
I threw this slide up as part of my talk yesterday at the IA Ventures Big Data conference. The talk was titled "Big Data with Small Data" and was my attempt at describing how we at BankSimple apply big data techniques to relatively small data sets.
It was a short talk and while this was my key slide, I didn’t have as much time to discuss it as I would have liked. What do you think?
I’ve spent the last 37 minutes trying to send $104 to a friend for tickets to a party. He specifically asked me to send him the money via PayPal. I have a PayPal account, so it shouldn’t have been a problem.
The problems began when I first attempted to send him the money. PayPal complained that they weren’t able to confirm that I owned the account. I’m currently on a business trip and using the hotel’s internet connection, so I figure the odd IP address is confusing PayPal. None of the suggestions provided suggested that they would fix the problem, so I decided to verify my account by linking PayPal directly to my checking account.
My checking account is with USAA, and linking the two required providing PayPal not only with the username and password to my bank account, but also answers to three security questions and my card’s PIN. It took me a good 10 minutes of switching back and forth between USAA and PayPal to sort that out. (It’s more difficult than it should be. Why don’t banks support OAuth?..)
By linking the two accounts I assumed that I would have provided PayPal with enough evidence that I owned my account. This wasn’t the case.
I also re-verified my email and set up and verified my phone number with PayPal. It took more than 5 minutes for their verification SMS to reach my phone which currently flutters between one bar and ‘Searching…’ while I’m out here in the mid-west. Not PayPal’s fault, but clunky.
None of this helped convince PayPal that I was who I said I was. If I gave anyone else all the information I had just shared with them, they could walk away with my money. But I couldn’t send $104 to pay for party tickets.
Its two thousand and ten. Money is electronic. Sending money between American banks, while clunky, is cheap(*) and easy. Doing it via PayPal is hard because PayPal’s supports international transfer and thus, rightfully, expends more effort fighting fraud than they do sending money. Horror stories, like mine, are common. But while PayPal could certainly improve their web interface, the majority of the experience failures are due to their interminable vigilance against fraud.
(*) Cheap for the banks. While it may cost them a tiny fraction of a penny, they will readily charge customers far more.
Although I’m really a C programmer, I’ve been doing more work over the past three years in Python. Today, for the first time, decided to write a digital signal processing program in Python. C is usually my go to language for these types of tasks and I felt like a fish out of water.
You can check out the code, if you are so inclined, to my spectrum analyzer at gist. There are few sections where the code reads more like what a C programmer would write, rather than that of a native Python programmer. I was having trouble clearly expressing the following line of code:
[ord(s[2*i]) | ord(s[2*i+1])<<8 for i in range(0,len(s)/2)]
Essentially, that code did exactly what I wanted, but I feel that there was probably a simpler way of expressing the same intent in Python. I asked my followers on Twitter for some help with the above segment, and got some useful answers. I then re-phrased the exact same question, exposing my intent, as:
Do you know of an inbuilt way to convert a byte stream containing unsigned 16 bit integers into an array of python ints?
“convert a byte stream containing unsigned 16 bit integers into an array of python ints”. Seventy-two characters to type. The code I was using consumed fifty-three characters. In some way, the code was 25% more efficient at expressing the underlying intent. And that brings me to the moral of this blog post.
I woke up this morning to a flurry of news stories about an SEC proposal to include Python code with asset backed securities (ABS) filings. The idea that while ABS documents are chock full of legalese, a computer program can provide a very concise way of understanding how a financial instrument is supposed to operate. I really liked the idea, but needed to know more.
This recommendation comes in a 667 page pdf. I just finished scanning through it, trying to find more details about the proposed implementation. You see, I spent a good chunk of last week writing a retail banking simulator in Python, and I have some questions about how they intend to do it. Of course, completely missing the point of their very sensible recommendation, no where in the document is there any Python code. Rather than making me go through hundreds of pages of text, I would have really appreciated a link to hundreds of lines of code.
Oh well. Their heart is in the right place.
So, why am I writing a retail banking simulator in Python? Well, at banksimple we have an ever growing Excel spreadsheet. Given the limitations of Excel, we make lots of broad assumptions about the distributions of things like account balances and daily spending. Given the non-linearities of both our business rules and human behavior, I want to get a sense of the sensitivity of our model to various risks. And the best way I know to do that is via simulation.
Rather than hiding code throughout gnarly cell references, I can clearly express business rules and customer responses in code and from there I can tweak inputs and asses the impact of distribution assumptions on our revenue model. Essentially, I build a universe with millions of bank customers and let them do the things that people do with money for a few years and see what happens. This is a very different approach to modelling in Excel. It is much better at capturing non-linearities.
I wanted to know whether the SEC proposal was for including full simulations of securities that can be composed of thousands of other instruments, or whether it was more like an Excel model, just written in Python. The answer is probably somewhere in those 667 pages, but I can’t find it.
Sorry I haven’t been blogging here recently. We are quite busy getting things ready with BankSimple. You can follow along over at our blog at http://banksimple.net/blog
A decade ago I was pitching web analytics software to a number of retail banks back in Australia. The pitching process was intense. Not only did we have to explain what web analytics was about all to marketing executives who had just discovered the internet, but we also had to get past operations teams who were held to a five nines standard. Five nines, or 99.999% uptime means six seconds of downtime per week. This attitude pervades the development of banking products, both in ways that are beneficial to the user and in ways that make what should be simple experiences painful.
Running a large teller machine network requires a serious investment in uptime. With thousands of transactions running through the American card networks each second, the fallout of losing even a fraction of a percent is too serious a risk to face light-heartedly. As a whole, the network is amazingly reliable. But point of sales machines don’t adhere to the same reliability levels as the network as a whole. The machines that you read your card swipes at stores are subject to risks from power failure, phone network outage or even running out of paper before coming close to hitting a system-wide network failure. In the event of a single failure at your local bodega, affected customers have the option of walking across the street to use an ATM or make some other minor adjustment to cope with a local problem. If the network as a whole were to go down, even for a short period of time, there would be chaos.
An individual POS machine meets local demand at 99.9% availability. This is how the system is meant to work. Likewise, the internet. The internet is designed to be fault tolerant, with the realization that it is cheaper (and often plain better) to tolerate failure instead of going to great length to avoid it. Your POS terminal could reach 4 nines if it were housed in a fancy co-location center, but that would be hardly be convenient. Yet when we were pitching marketing analytics to the banks we got the impression that if banking technologists had their way, all bodegas would have multi-homed candy aisles and drip coffee machines with backup generator power.
While this attitude has shifted since the formation of independent internet banking groups as distinct from core banking operations, tight coupling between legacy systems results in a development process that is just completely wrong. Banking is complex. Retail operations are tightly regulated. Applying this mentality to every aspect of banking leads to unnecessary inflexibility. Yet successful online experiences are defined by development processes that rapidly iterate and evaluate. The idioms for online interactions are rapidly changing – and at this point in our short history of internet usage it is difficult to see many points of conversion. This is the zeroth hour and iteration begets discovery of new interactions and continuing evolution of a common language for working with the web.
As many of you know, I’m working with a great group reinventing retail banking. A big part of what we want to get right is our user experience. In researching this project, I’ve signed up with a bunch of different banks across America. The typical process for doing so heavily reflects the forms that you fill and experience that you would have opening an account at a branch. While many of these form fields are required by law, the end user experience is heavily informed by legacy development processes. Branches were satellite offices, connecting to mainframes via expensive WANs. The process feels unwieldy to new customers as their applications move in lock step through a system that was designed as a series of incremental improvements over a pen and paper driven process of just two decades ago.
Most banks operate their core transactional processing systems on a batch cycle. This stems from the way the Fed works with banks for overnight lending. And as the internet groups budded off from core banking operations, the process of institutional mitosis resulted in a bunch of useless DNA being carried over. One of the banks I have accounts with (a top 5 US bank) regularly closes their internet banking site on Sunday nights for scheduled maintenance. When did Facebook last shut down for maintenance, scheduled or otherwise?
One thing we are working very hard towards is the distinction between parts of our banking service that must be highly reliable/available and those where we can iterate quickly. For example, take call center operations. A good phone system, simply due to variability in customer demand, will need staffing slack to reach given quality of service – in terms of expected time on hold, and time to resolution. You can spend a tonne on reaching a certain level of technological availability, but have little impact in key measures of customer satisfaction due to the different costs involved in staffing. With a five nines phone system you can still deliver a shitty service if your call center capacity is capped out by 100% staff utilization.
When making decisions about technology behind a retail bank, such as the call center or web site, we choose to trade iterability for early wins in the cost of scaling. Large, complex and thus rigid systems make it difficult to evaluate competing operating procedures. Short sighted metrics for success lead to short sighted incremental improvements. Free from the constraints of public markets we are able to take a risk and try something different – even if we don’t know, a priori, how different it has to be. We believe it is critical to be able to try new things quickly, learn from our customers and improve their experience based on data.
I’m always scared to have our audience ask “What about feature X?" during our early technology demos. More often than not "feature X" is already in our feature tracking system, but market as "Will not fix." Even very simple features that would take only minutes of development time to implement have far reaching consequences. If we added the ability to filter transactions by date, for example, a number of quick decisions would be made about how to implement the user interface - with no possible implementation resulting in an interface less complex than simply leaving the feature out.
Additional complexity, without a clear use case, is bad. The flexibility to add new features as they are justifiably demanded is good. Complex systems work best when they are adaptive. Desinging new features in bulk and dumping them on users after a 12 month development cycle is just cruel. Especially so in banking, where mistrust is rampant and fear of making a mistake is justified. Better to iterate quickly and support an adaptive complexity landscape.
People often ask us what it is that makes us better than other banks. Glibly, I respond that we are just a plain old retail bank – but we don’t suck. Not sucking is our killer app.
I don’t know what that means in terms of fine grained details for future features. Sure, we have prior beliefs as to what experiences suck more than others under the current banking model, and where we should appropriately spend our valuable time. But we are also fine with being wrong - hell, we expect it. The only thing that we believe is that by setting ourselves up to respect and learn from our customers experience, we win. Other banks just can’t do that.
Photo by: riv / flickr
According to Mr. Meara, 90 percent of all transactions with bank tellers involve checks. If everyone had an iPhone deposit app, people wouldn’t come into the branch as often. That would be fine had banks not invested so much time and energy in training branch workers to persuade checking account customers to move into more profitable products.
"One the one hand, fewer deposit transactions could mean a headcount reduction," he said. "But it invites the erosion of store profitability. The banks are struggling with the enormity of what it means."
Hurry up & credit my account New York Times, September 18, 2009
Brand apathy is rampant in retail financial services. The number one sales channel is the branch. That expensive, increasingly empty, retail space is how banks sell new products to customers. “Is there anything else that I can help you with today” is the alpha and omega of retail bank marketing, with a small epsilon for banners of smiling families plastered inside branches.
Below is a screenshot of what I see when I log into my American Express account. A small portion of the screen is stuff I care about. A small, but significant, portion of my emotional well being rides on those numbers. The bulk of the page is dedicated to selling me stuff. How about this? Make understanding and working with my money easier - make me happier, and then I’ll be far more receptive to upsells. But if you can’t even get basic information and interaction right, then I’m too busy worrying about my current state of affairs to consider new fangled products and the incremental complexity they entail.
As someone who has spent most of their working career selling data to advertisers, I’m suprised by the number of businesses that are predicated on the model of selling data to advertisers. If you have a great widget, it is easy to get it in the hands of millions of people, especially if you are giving it away for free. As programmers and scientists we deify data. What we don’t do is understand advertising. Sure we understand that advertising is about selling stuff, but we don’t seem to get that the advertising industry exists to sells ways of selling stuff.
Whether an external agency or an internal group, advertising professionals have to convince others that they are adding value. And if your model is simply to sell data to advertisers, you have to convince them that your data is worth at least as much as what you are selling it for. Your data needs to be useful. Smart techniques are popping up for doing interesting things with large amounts of data. But interesting isn’t always useful. If you think your customer data is useful, you should use your data to make your product better. Data that is valued by your users for the richer experience it provides is likely to be valued by others. If you can’t improve your customers experience with their own data, your data is worth nothing.
While I’m ranting, let me ask you something, Randall. At the risk of sounding like Glenn Beck Jr. - what the fuck has gone wrong with our country? Used to be, we were innovators. We were leaders. We were builders. We were engineers. We were the best and brightest. We were the kind of guys who, if they were running the biggest mobile network in the U.S., would say it’s not enough to be the biggest, we also want to be the best, and once they got to be the best, they’d say, How can we get even better? What can we do to be the best in the whole fucking world? What can we do that would blow people’s fucking minds? They wouldn’t have sat around wondering about ways to fuck over people who loved their product. But then something happened. Guys like you took over the phone company and all you cared about was milking profit and paying off assholes in Congress to fuck over anyone who came along with a better idea, because even though it might be great for consumers it would mean you and your lazy pals would have to get off your asses and start working again in order to keep up.
And not just you. Look at Big Three automakers. Same deal. Lazy, fat, slow, stupid, from the top to the bottom - everyone focused on just getting what they can in the short run and who cares what kind of piece of shit product we’re putting out. Then somehow along the way the evil motherfuckers on Wall Street got involved and became everyone’s enabler, devoting all their energy and brainpower to breaking things up and parceling them out and selling them off in pieces and then putting them back together again, and it was all about taking all this great shit that our predecessors had built and “unlocking value” which really meant finding ways to leech out whatever bit of money they could get in the short run and let the future be damned. It was all just one big swindle, and the only kind of engineering that matters anymore is financial engineering.
The message is pretty clear - competition, the pillar of capitalism, results in better products and services for consumers. When lenders compete, you win. While this is the slogan for one major mortgage lead generator, the methodology is common to the industry as a whole. And people believe that magic technology fosters competition, with the net benefit of better lending rates.
The reality is a little different. When I oversaw the operations of a mortgage marketplace, the competition was not in terms of the products offered, but rather, the price paid for getting a person’s attention. Lenders would bid for leads and the lenders who paid the highest price received the most leads. Thus the incentives were counter to people’s rational goals. The lenders with the highest margins were able to spend the most on customer acquisition, while lenders with more affordable products were unable to reach the same audience.
Google recently publicized their direct entry into this space. Prior to their entry they captured only a portion of the marketing dollars - with lead generators buying keyword ads on Google, funneling the traffic to their site, collecting lead information and selling to the highest paid mortgage providers. When the lead generators spend on Google was lower than their revenue from the mortgage companies, they profited. A mercenary and highly unregulated bunch, the lead generators would go to great lengths to screw the consumer.
Google’s product appears better in that rather than selling out the consumer for the highest price, they display a targeted list of options - clearly outlining the competing offers - letting the consumer decide which companies to contact for a quote. As is always the case, transparency leads to a better outcome for people.
Despite the numerous and simultaneous failures in the mortgage marketplace that has so deeply scarred the American economy, one upside that is often forgotten is the benefit of standardization of lending products. Prior to the development of the mortgage backed security market mortgage contracts varied greatly in their structure and terms. And while they remain complex financial contracts, standardization means that a consumer is able to properly evaluate the bulk of the financial impact of their mortgage choice by simply examining a handful of parameters.
I have a home equity loan. I also have credit cards, savings accounts and brokerage accounts. The simplest account that I have, and the one that sees the most action, is the humble checking account. My checking account has a 36 page introductory preamble that outlines the terms and conditions. These terms are fully documented on a corner of my bank’s web site, and change on a semi-monthly basis. No one reads these terms.
I spent my summer reading not only the terms of my checking account, but of all of my accounts and the accounts at other major banks in America. You’d be terrified to know what they actually contain. That is, if you could find them. The GAO found that 65% of banks do not make these documents available on the web, and 35% fail to produce them if you visit a branch.
And these terms matter.
Unlike most people’s mental model of retail banking operations, banks do not make most of their money on the difference between the rates at which they lend versus the rate they offer for savings. American banks, quite distinctly from banks elsewhere in the world, make the bulk of their money from fees and charges. Invisible and often unavoidable consequences of little clauses in contracts that no one ever reads.
This stands in stark contrast to the message that we hear in bank marketing. Retail bank marketing is dominated by APR: Best rate savings! Lowest rate on credit cards! Yet the largest financial impact to the consumer is fees and charges.
Fees and charges that consumers have no hope of simply understanding.
Lead generation is rife across the financial product landscape. Some companies try to offer Google-like services for better helping consumers choose financial products - but these services fall into the trap of not taking into account the obscure and non-standardized terms that most impact financial well being. And as such no one believes the offers they see in sites like Mint.com. If people honestly believed they could “Save $2,000 by switching to Bank XYZ’s credit card” then the conversion rates on these offers would be vastly better than the prevailing rates.
And so with all the technology that we have at our disposal, people are no better off. Banks have no incentive to increase transparency, lead generators have no incentive to provide real offers and immense brand apathy prevails resulting in short sighted decisions further driving down customer experience. The cycle continues.
Until it stops.